• Open

    Time Series Analysis for air pollution data not aligned [R] [P]
    This is about a project that I am working on; Hope the ML community can help me! I have collected few hours of air pollutants data using Aeroqual sensors and custom made sensors. 3 types of data is available in the project; aeroqual, custom, council data. Where council data can be taken for granted (It comes from the govt installed high spec sensor). Aeroqual is a commercial sensor manufacturing company, its data should be accurate. The first part of the project is about checking the accuracy of custom sensor. So, I have done few analysis on the data; and found that custom sensor data has similarity (but not same, there are so much variation in the custom sensor data) with council sensor data but aeroqual data is way different. I am attaching the plot below which I have done. ​ So I need to know is there any method that I can find relationship between these three datasets? Is it possible to make these data align togather? I need to build an ML model to predict the air pollutant level using this data. any tips for getting this thing working? - Thanks in advance ​ https://preview.redd.it/fzu7dbsz0dv81.png?width=885&format=png&auto=webp&s=516d65fe3290ac8a28159547880f9dc972922b64 submitted by /u/Codename_17 [link] [comments]  ( 1 min )
    [D] For training a HAAR cascade is it better to manually remove noise from positive training images or to leave it in so the data is more realistic?
    submitted by /u/Counter-Business [link] [comments]  ( 1 min )
    [D] How to convert papers to code?
    My problem is probably what you have guessed: it's understanding the technical specifications which are usually written in a non-coding-friendly way. Sometimes crucial information is completely missing from the paper ex: loss function description for a DL algorithm. For the lucky cases where there are already available implementations on github to a given paper, usually they are either very distinguishable from each other in terms of code structure which questions their validity or whether they match what the paper authors intended specially with varying measurable results, or they are almost exact copies from one another. There are numerous examples where I can show specific papers with varying degrees of complexity, and discuss why the conversion can be tricky but they may require standalone discussions themselves, likely outside the scope of this one. Is there a way to approach the problem assuming the absence of reference code? submitted by /u/shine-box [link] [comments]  ( 2 min )
    Open Source Model For Identifying Extremism Online [Project]
    submitted by /u/OppositeMonday [link] [comments]
    [P] Tired of manually sending minutes of meeting
    I host an important org level meeting (~100 attendees) every week, and need to share minutes after the meeting. I am so tired of listening to conversations again just to capture important points, summarise discussion and action items. Is there any model/api which can help me do that? I use Amazon transcribe to generate transcripts, which helps, but it is not very accurate. For me the priority would be: 1) Model/api which is better than Amazon transcribe 2) Auto Identify speakers / speaker diarization (since mostly the same set of people speak) 3) Summarise the conversations into topics (we have time and agenda based discussion) I am sure this might be a problem across the industry since most of the meetings happen online, and someone wastes hours after meeting to send notes. I did find some tools which summarise the transcript, but i need to auto send in a specific format and identify topics based on conversion (maybe we can input the agenda in advance). Also this is private information, so I need something on premise, hence looking for a repo or model which i can use to build something on top. Please let me know if something exists or someone working on similar projects. Happy to collaborate and contribute. submitted by /u/super_commando-dhruv [link] [comments]  ( 1 min )
    [R] ?? Can you find out which news article is written by AI ??
    This research will test the human ability to distinguish human written text from text generated by artificial intelligence. Participating will only take 10 minutes. You will receive 2 short news articles about the same topic. One will be written by a human, the other one will be generated by artificial intelligence. It is up to you to find out which one is written by artificial intelligence. You will be asked to do this for four different subjects, namely: Science, Economics & Politics, Society and Sports. At the end of the survey you will receive feedback on how well you have performed. The human written articles were collected from various news websites. The Articles created by artificial intelligence were generated using GPT-3 from OpenAI. Purpose of the research: We are trying to find out how well GPT-3 performs across subjects. Are there any subject GPT-3 is better at writing about, or is he equally good across all subjects. Secondly we are testing the ability of GPT-3 to generate articles about events that happened after the training of the model. You can participate by clicking on the link below, thank you very much for your participation. https://vub.fra1.qualtrics.com/jfe/form/SV_b2E9f6hGxNDH13M submitted by /u/RobinSandersVUB [link] [comments]  ( 1 min )
    [R] I need to run >2000 experiments for my PhD work. How much would 2000 GPUs for 1 day cost?
    2000 GPUs and 8000 CPUs. And where could I even get such a vast affordance? submitted by /u/samlerman [link] [comments]  ( 2 min )
    [P] Vectorflow is a minimalist neural network library optimized for sparse data and single machine environments open sourced by Netflix
    submitted by /u/ur_mum_goes_to_uni [link] [comments]
    [Project] Face detection algorithms comparison
    I selected 5 ready-made algorithms for face detection and compared them with each other by such metrics as Precision, Recall, IOU and time on the dataset I marked up. I am ready to accept your Pull Request with your solutions(algorithms) and results! GitHub: https://github.com/wb-08/face-detection-algorithms-comparison Blog post: https://habr.com/ru/post/661671/ submitted by /u/wb-08 [link] [comments]
    [Discussion] Writing production grade code for ML in python
    I have been interviewing for a machine learning lead position. I have successfully passed 3 interview rounds (coding , HR, system design). I have my final interview with the VP of Engineering. When asked how best to prepare myself, they said they would like to test my ability to write "production quality" code in python. While I do have some experience, the downside is I worked in small R&D teams for a long time. Though I am knowledgeable in python, perhaps, I might have not followed all the industry best practices. If you are a hiring manager or interviewer, how would you test this ability? How do I prepare myself to prove my ability to write production grade code? Thank you all so much in advance. submitted by /u/mbkv [link] [comments]  ( 4 min )
    [D] Comparing the efficiency of different GAN models
    I'm comparing different GAN models (CGan, DCGan, WGan, StyleGan) in tensorflow2. In general, I want to use the images that I generate with the generator to train a classifier while being as realistic as possible. At first, I wanted to let them train for 24 hours each, define some early stopping criteria and save the checkpoints with the lowest loss through a callback. But it seems that the lower loss does not always lead to more realistic images. So how do I compare the different models in a scientific way? Because the results highly depend on the epoch I choose and my subjective feeling, which images look the best. submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P], Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    https://www.youtube.com/watch?v=2E_6ARbrMmc submitted by /u/Thenamessd [link] [comments]
    [N] Google's new AI image analysis is pretty LiT - and beats OpenAI's CLIP
    submitted by /u/much_successes [link] [comments]
    [P] A Simpler @PyTorch Annotated Implementation of EleutherAI's 20B Language Model GPT-NeoX.
    Github: https://github.com/labmlai/neox Annotated implementation: https://lit.labml.ai/github/labmlai/neox/tree/main/src/neox/__init__.py Original repo from EleutherAI: https://github.com/EleutherAI/gpt-neox We have included samples showing how to generate text and to fine-tune. We haven't included a bunch of optimizations that were present in original GPT-NeoX to keep things simple. submitted by /u/hnipun [link] [comments]  ( 1 min )
    [P] treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    [D] ‘auton-survival’ package for deep survival analysis and time to event regression from CMU.
    Comes with ‘white paper’ and example notebooks… seems legit..? Anyone tried this out yet? Github Paper] submitted by /u/proportional-hazard [link] [comments]
    [P] Unofficial ViT-VQGAN implementation
    I know that many people (including me) were surprised after seeing the image quality of ViT-VQGAN and disappointed to know there won't be no source code released. Therefore, I've decided to implement it by myself and here is the code. I hope this can help everyone as a starting point for ViT-VQGAN. submitted by /u/ThunaClone [link] [comments]
    [R][P] StyleGAN-Human: A Data-Centric Odyssey of Human Generation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Review of end-to-end multi-modal deep learning approach for autonomous navigation
    In reviewing various approaches to end-to-end deep learning for autonomous driving, I've come across an interesting approach in this paper that I would like to discuss with others... I will begin by summarizing the approach: ​ A ResNet50 architecture is used as an encoder network with the input being an RGB image + depth map concatenated as (224 x 224 x 4). In the paper it is argued that a point cloud can also be used, or some other sensor modality would also work The encoder network output (feature map of 7 x 7 x 2048) is fed into a decoder network that takes it back to (224 x 224 x 5) with pixel wise semantic segmentation of 5 classes: lane, road line, sidewalk, vehicles or pedestrians, and others That same encoder output (feature map of 7 x 7 x 2048) is global average pooled to 2…  ( 2 min )
  • Open

    mGPT: Few-Shot Learners Go Multilingual
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Biological feedback will save us all
    Dall-E-2. Excellent. It's very high quality. But it's a combination of the data. ​ What did we want? We wanted some amazing work that made us cry with just one line of writing or one image. ​ "Oh, copied it well. It's pretty much the same." It's not enough. But how can that be improve? I think the answer is the feedback method. ​ ​ The current evaluation method of writing, image, video, and sound is too indirect. ​ Sales revenue Number of Subscribers Number of views Like / Dislike Ratings by section, Revisit Rate <<< Those are better than others Emotion analysis of Comments using AI Internal staff scores ​ There are so many conditions other than the quality of contents that people's judgment can intervene in. In the first place, people don't express exactly what they…  ( 3 min )
    16 images generated for text prompt "Woah there, Dragonman!" using a text-to-image AI model from CompVis that uses latent diffusion (crosspost of another user's post)
    submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    NVIDIA Instant NeRF: Turn Photos into 3D Scenes in Milliseconds ! Video demo
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    MIT's new machine-learning system M2I may someday help driverless cars predict the next moves of others
    submitted by /u/qptbook [link] [comments]
    Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Human Like AI where should i start
    Hello there, if one would want to get into AI and especially human like AI, would you still recommend getting into machine learning first? As far as i know machine learning doesnt even try to develop "human like" AI/"bottom up AI", but rather focuses on training algorythms to solve specific problems. I know human like AI is something thats highly complex and we still need years if not even decades to achieve something even close to it but i would appreciate tips and ideas nonetheless. (after reading through my question again this sounds like a generic question thats being asked here everyday, if thats the case please send me a link to a similar post if there is one :) ) submitted by /u/Garic152 [link] [comments]  ( 1 min )
    help with a project idea
    Hi everyone Im doing a project with my friends where we should use computer vision/iot to create a solution for people with disabilities or in the healthcare system Any ideas please submitted by /u/armyy__ [link] [comments]
    Meta AI Researchers Built An End-To-End Machine Learning Platform Called Looper, With Easy-To-Use APIs For Decision-Making And Feedback Collection
    From improving the user experience to making the computational infrastructure more effective, AI is a crucial aspect of making current software systems and products perform as well as possible. AI is often more effective than even precisely developed human-crafted heuristic tactics today, whether it’s reducing latency, boosting the quality of a video stream, or streamlining the interfaces to match a specific person’s demands. But, to use AI more effectively in various products, several challenges must be addressed: the system must accommodate software engineers without machine learning backgrounds; it must provide mechanisms to optimize for a variety of product goals, which may differ from closed-form machine learning loss functions; it must distinguish causal connections from data correlations; and it must scale efficiently to train, host, and monitor vast numbers of AI models. Meta Researchers Develop ‘Looper,’ an end-to-end AI platform that has been designed with easy-to-use APIs for optimization, personalization, and feedback collecting to answer these needs. Looper may be used to support the entire machine learning lifecycle, from model training to deployment and inference to product evaluation and optimization. Looper allows us to modify the existing products to leverage AI for personalized optimizations rather than having to rebuild them around AI models. Currently, the Looper platform hosts 700 AI models and produces 4 million AI outputs every second. Continue reading Paper: https://arxiv.org/pdf/2110.07554.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there any programs that can output a sentence based on input sentences?
    I'm looking to create a way to automate original story ideas based on previous ideas. I want to be able to input 1000+ original sentences and have an output of an original sentence that is inspired the previous ones. Are there any programs that can do this or will I need to develop my own? submitted by /u/yea_okay_dude [link] [comments]  ( 1 min )
    Ultimate Guide to Activation Functions
    submitted by /u/SirFletch [link] [comments]
  • Open

    How to stop stable baseline model during the training exactly at the end of frame?
    I am training PPO2 model on stable-baseline library. I have tabular data with 15000 rows, thus length of the episodes is 15000. I am using nminibatches=4, n_envs=1. For example, I have set total_timesteps=10000. During the training process agent will see 15000 rows several times and updates actions for each rows, but in some particular point, the rest of the time total_timesteps will not be enough to see the full episode, and only part of episodes is available in the last step of learning. To be concrete. For simplicity, lets say we have 10 raws, 23 total_timesteps. The agent will see the full episode 2 times, and only the first 3 rows in the third times and rest of the 7 raws have not seen during last step. I want to stop the learning process when Agent reaches the last time full episodes (above example stop learning at when total_timesteps=20) or define total_timesteps in such a way to see full episodes at the end of the training step. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    New to RL
    Hello guys, I am pretty new to the rl field and write now i am doing my thesis in it. I've come across a problem in my code. I created a custom environment and when i am trying to solve it with my dqn agent using stable baselines3, I am able to execute the code and print out the required things but the agent is not learning. Any help ? thanks. submitted by /u/last_2_brain_cells97 [link] [comments]  ( 1 min )
    Questions on policy gradients
    Hi guys, I am new to RL and reading tutorial of spinning up which focus on policy based algorithms. In the derivation of VPG, the tutorial said"The environment has no dependence on /theta(the parameter of policy), so gradients of R(/tau)(total return of the trajectory) with respect of /theta is 0. However, the trajectory depends on our policy, and our policy depends on /theta. As a result, I am confused why total return of trajectory is independent from /theta. submitted by /u/SkyRimT [link] [comments]  ( 2 min )
    Vicarious exits: acquihired by Google robotics (Intrinsic) & DeepMind
    submitted by /u/gwern [link] [comments]
  • Open

    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    I don't understand why I am getting NaN loss scores. Can anyone explain what I am doing wrong ?
    submitted by /u/brike3 [link] [comments]  ( 1 min )
    Are there applications of neural networks other than machine learning?
    I see lots of hardware oriented toward AI/ML stuff these days, including chips with hardware acceleration for neural networks. I'm thinking about how GPUs were initially designed for graphics calculations, but then things like CUDA and OpenCL were developed to make that hardware usable for broader applications of parallel processing. Are there any other things that you can do with a neural network besides backpropagation, that wouldn't be easier to do in other ways? submitted by /u/Bananawamajama [link] [comments]  ( 1 min )
  • Open

    My Paper Reviewing Load
    In academia, for better or worse, we have what’s called a peer review system, where papers get accepted to journals, conferences, or other venues on the basis of reviews from other researchers, who ideally are subject area experts and thus are qualified to evaluate the paper. The reviewers also cannot have a conflict of interest with the authors, and should not be overwhelmed with too many papers to review. This is the ideal world, and is not always what happens in practice. From my experience in the robotics academic community (and this may apply to other disciplines), it generally seems like there is no standard definition of an “appropriate” or “maximum” reviewing load for a reviewer. This is difficult to define as different papers mandate different reviewing efforts; a massive journal …  ( 4 min )
  • Open

    A manifold learning approach for gesture recognition from micro-Doppler radar measurements. (arXiv:2110.01670v4 [cs.LG] UPDATED)
    A recent paper (Neural Networks, {\bf 132} (2020), 253-268) introduces a straightforward and simple kernel based approximation for manifold learning that does not require the knowledge of anything about the manifold, except for its dimension. In this paper, we examine how the pointwise error in approximation using least squares optimization based on similarly localized kernels depends upon the data characteristics and deteriorates as one goes away from the training data. The theory is presented with an abstract localized kernel, which can utilize any prior knowledge about the data being located on an unknown sub-manifold of a known manifold. We demonstrate the performance of our approach using a publicly available micro-Doppler data set, and investigate the use of different preprocessing measures, kernels, and manifold dimensions. Specifically, it is shown that the localized kernel introduced in the above mentioned paper when used with PCA components leads to a near-competitive performance to deep neural networks, and offers significant improvements in training speed and memory requirements. To demonstrate the fact that our methods are agnostic to the domain knowledge, we examine the classification problem in a simple video data set.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    Accurate detection of sepsis at ED triage using machine learning with clinical natural language processing. (arXiv:2204.07657v2 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.
    Visual Attention Methods in Deep Learning: An In-Depth Survey. (arXiv:2204.07756v2 [cs.CV] UPDATED)
    Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.
    On Distribution Shift in Learning-based Bug Detectors. (arXiv:2204.10049v1 [cs.LG])
    Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g. >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our constructed test set and the latest version of open source repositories.
    Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion. (arXiv:2204.07741v2 [cs.HC] UPDATED)
    Persuading people to change their opinions is a common practice in online discussion forums on topics ranging from political campaigns to relationship consultation. Enhancing people's ability to write persuasive arguments could not only practice their critical thinking and reasoning but also contribute to the effectiveness and civility in online communication. It is, however, not an easy task in online discussion settings where written words are the primary communication channel. In this paper, we derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions through a survey with 123 online forum users and interviews with five debating experts. To satisfy these design goals, we analyzed and built a labeled dataset of fine-grained persuasive strategies (i.e., logos, pathos, ethos, and evidence) in 164 arguments with high ratings on persuasiveness from ChangeMyView, a popular online discussion forum. We then designed an interactive visual system, Persua, which provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments. In particular, the system constructs portfolios of arguments based on different persuasive strategies applied to a given discussion topic. It then presents concrete examples based on the difference between the portfolios of user input and high-quality arguments in the dataset. A between-subjects study shows suggestive evidence that Persua encourages users to submit more times for feedback and helps users improve more on the persuasiveness of their arguments than a baseline system. Finally, a set of design considerations was summarized to guide future intelligent systems that improve the persuasiveness in text.
    Learning to Hash Naturally Sorts. (arXiv:2201.13322v2 [cs.CV] UPDATED)
    Learning to hash pictures a list-wise sorting problem. Its testing metrics, e.g., mean-average precision, count on a sorted candidate list ordered by pair-wise code similarity. However, scarcely does one train a deep hashing model with the sorted results end-to-end because of the non-differentiable nature of the sorting operation. This inconsistency in the objectives of training and test may lead to sub-optimal performance since the training loss often fails to reflect the actual retrieval metric. In this paper, we tackle this problem by introducing Naturally-Sorted Hashing (NSH). We sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. Thanks to the recent advances in differentiable sorting approximations, the hash head receives gradients from the sorter so that the hash encoder can be optimized along with the training procedure. Additionally, we describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning, which allows NSH to mine data semantic relations during training in an unsupervised manner. Our extensive experiments show the proposed NSH model significantly outperforms the existing unsupervised hashing methods on three benchmarked datasets.
    Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets. (arXiv:2109.13514v2 [cs.CV] UPDATED)
    Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Deep learning techniques for energy clustering in the CMS ECAL. (arXiv:2204.10277v1 [hep-ex])
    The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses. Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise, helping to preserve the electron and photon energy resolution achieved during LHC Runs 1 and 2. This work will cover the challenges of training the models as well the opportunity that this new approach offers to unify the ECAL energy measurement with the particle identification steps used in the global CMS photon and electron reconstruction.
    Condition Monitoring of Transformer Bushings Using Computational Intelligence. (arXiv:2204.10193v1 [cs.LG])
    Dissolved Gas-in-oil analysis (DGA) is used to monitor the condition of bushings on large power transformers. There are different techniques used in determining the conditions from the data collected, but in this work the Artificial Intelligence techniques are investigated. This work investigates which gases in DGA are related to each other and which ones are important for making decisions. When the related and crucial gases are determined, the other gases are discarded thereby reducing the number of attributes in DGA. Hence a further investigation is done to see how these new datasets influence the performance of the classifiers used to classify the DGA of full attributes. The classifiers used in these experiments were Backpropagation Neural Networks (BPNN) and Support Vector Machines (SVM) whereas the Principal Component Analysis (PCA), Rough Set (RS), Incremental Granular Ranking (GR++) and Decision Trees (DT) were used to reduce the attributes of the dataset. The parameters used when training the BPNN and SVM classifiers are kept fixed to create a controlled test environment when investigating the effects of reducing the number of gases. This work further introduced a new classifier that can handle high dimension dataset and noisy dataset, Rough Neural Network (RNN).
    Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions. (arXiv:2203.12235v2 [cs.CL] UPDATED)
    The syntactic categories of categorial grammar formalisms are structured units made of smaller, indivisible primitives, bound together by the underlying grammar's category formation rules. In the trending approach of constructive supertagging, neural models are increasingly made aware of the internal category structure, which in turn enables them to more reliably predict rare and out-of-vocabulary categories, with significant implications for grammars previously deemed too complex to find practical use. In this work, we revisit constructive supertagging from a graph-theoretic perspective, and propose a framework based on heterogeneous dynamic graph convolutions aimed at exploiting the distinctive structure of a supertagger's output space. We test our approach on a number of categorial grammar datasets spanning different languages and grammar formalisms, achieving substantial improvements over previous state of the art scores. Code will be made available at https://github.com/konstantinosKokos/dynamic-graph-supertagging
    Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks. (arXiv:2204.09942v1 [cs.CR])
    Industrial control systems (ICSs) are facing increasing cyber-physical attacks that can cause catastrophes in the physical system. Efficient anomaly detection models in the industrial sensor networks are essential for enhancing ICS reliability and security, due to the sensor data is related to the operational state of the ICS. Considering the limited availability of computing resources, this paper proposes a hybrid anomaly detection approach in cloud-edge collaboration industrial sensor networks. The hybrid approach consists of sensor data detection models deployed at the edges and a sensor data analysis model deployed in the cloud. The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load. The sensor data analysis model based on Graph convolutional network, Residual algorithm and Long short-term memory network (GCRL) can effectively extract the spatial and temporal features and then identify the attack precisely. The proposed hybrid anomaly detection approach is evaluated using a benchmark dataset and baseline anomaly detection models. The experimental results show that the proposed approach can achieve an overall 11.19% increase in Recall and an impressive 14.29% improvement in F1-score, compared with the existing models.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.
    A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms. (arXiv:2204.09825v1 [cs.LG])
    Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.
    Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. (arXiv:2204.09781v1 [cs.DL])
    The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.
    Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. (arXiv:2204.09831v1 [physics.chem-ph])
    We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
    Memory Bounds for the Experts Problem. (arXiv:2204.09837v1 [cs.DS])
    Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of $n$ "experts" who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set. The classical algorithm for this problem is the multiplicative weights algorithm. However, every application, to our knowledge, relies on storing weights for every expert, and uses $\Omega(n)$ memory. There is little work on understanding the memory required to solve the online learning with expert advice problem, or run standard sequential prediction algorithms, in natural streaming models, which is especially important when the number of experts, as well as the number of days on which the experts make predictions, is large. We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem using a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show novel ways to run standard sequential prediction algorithms in rounds on small "pools" of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. (arXiv:2204.09934v1 [eess.AS])
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.
    Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation. (arXiv:2204.09975v1 [cs.LG])
    Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs).Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i.e., attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between both teacher and student models during knowledge distillation, ARGD can eradicate more backdoor triggers than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94.85% reduction in ASR, while ACC can be improved by up to 3.23%.
    Social Media Sentiment Analysis for Cryptocurrency Market Prediction. (arXiv:2204.10185v1 [cs.CL])
    In this paper, we explore the usability of different natural language processing models for the sentiment analysis of social media applied to financial market prediction, using the cryptocurrency domain as a reference. We study how the different sentiment metrics are correlated with the price movements of Bitcoin. For this purpose, we explore different methods to calculate the sentiment metrics from a text finding most of them not very accurate for this prediction task. We find that one of the models outperforms more than 20 other public ones and makes it possible to fine-tune it efficiently given its interpretable nature. Thus we confirm that interpretable artificial intelligence and natural language processing methods might be more valuable practically than non-explainable and non-interpretable ones. In the end, we analyse potential causal connections between the different sentiment metrics and the price movements.
    FedCL: Federated Contrastive Learning for Privacy-Preserving Recommendation. (arXiv:2204.09850v1 [cs.LG])
    Contrastive learning is widely used for recommendation model learning, where selecting representative and informative negative samples is critical. Existing methods usually focus on centralized data, where abundant and high-quality negative samples are easy to obtain. However, centralized user data storage and exploitation may lead to privacy risks and concerns, while decentralized user data on a single client can be too sparse and biased for accurate contrastive learning. In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected. We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise, and the cluster centroids are used to retrieve hard negative samples from the item pool. These hard negative samples are delivered to user clients and mixed with the observed negative samples from local data as well as in-batch negatives constructed from positive samples for federated model training. Extensive experiments on four benchmark datasets show FedCL can empower various recommendation methods in a privacy-preserving way.
    Adversarial Contrastive Learning by Permuting Cluster Assignments. (arXiv:2204.10314v1 [cs.LG])
    Contrastive learning has gained popularity as an effective self-supervised representation learning technique. Several research directions improve traditional contrastive approaches, e.g., prototypical contrastive methods better capture the semantic similarity among instances and reduce the computational burden by considering cluster prototypes or cluster assignments, while adversarial instance-wise contrastive methods improve robustness against a variety of attacks. To the best of our knowledge, no prior work jointly considers robustness, cluster-wise semantic similarity and computational efficiency. In this work, we propose SwARo, an adversarial contrastive framework that incorporates cluster assignment permutations to generate representative adversarial samples. We evaluate SwARo on multiple benchmark datasets and against various white-box and black-box attacks, obtaining consistent improvements over state-of-the-art baselines.
    TND-NAS: Towards Non-Differentiable Objectives in Differentiable Neural Architecture Search. (arXiv:2111.03892v2 [cs.LG] UPDATED)
    Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its high efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption. However, these methods are no longer naturally capable of tackling the non-differentiable objectives, e.g., energy, resource-constrained efficiency, and other metrics, let alone the multi-objective search demands. Researches in the multi-objective NAS field target this but requires vast computational resources cause of the sole optimization of each candidate architecture. In light of this discrepancy, we propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS. Under the differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the progressive search space shrinking by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3\%, 2.4M/2.95\%, 9.57M/2.54\%) and CIFAR100 (2.46M/18.3\%, 5.46/16.73\%, 12.88/15.20\%) datasets. Favorably, compared with other multi-objective NAS methods, TND-NAS is less time-consuming (1.3 GPU-days on NVIDIA 1080Ti, 1/6 of that in NSGA-Net), and can be conveniently adapted to real-world NAS scenarios (resource-constrained, platform-specialized).
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.  ( 2 min )
    Surfer100: Generating Surveys From Web Resources on Wikipedia-style. (arXiv:2112.06377v2 [cs.CL] UPDATED)
    Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.  ( 2 min )
    Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity. (arXiv:2111.05329v3 [cs.CV] UPDATED)
    We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard synchronous cross-modal relations, CrissCross also learns asynchronous cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong generalized representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50 and DCASE). The codes and pretrained models will be made publicly available.  ( 2 min )
    NICO++: Towards Better Benchmarking for Domain Generalization. (arXiv:2204.08040v2 [cs.CV] UPDATED)
    Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
    Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms. (arXiv:2204.09828v1 [cs.NE])
    Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
    DeepGate: Learning Neural Representations of Logic Gates. (arXiv:2111.14616v3 [cs.LG] UPDATED)
    Applying deep learning (DL) techniques in the electronic design automation (EDA) field has become a trending topic. Most solutions apply well-developed DL models to solve specific EDA problems. While demonstrating promising results, they require careful model tuning for every problem. The fundamental question on "How to obtain a general and effective neural representation of circuits?" has not been answered yet. In this work, we take the first step towards solving this problem. We propose DeepGate, a novel representation learning solution that effectively embeds both logic function and structural information of a circuit as vectors on each gate. Specifically, we propose transforming circuits into unified and-inverter graph format for learning and using signal probabilities as the supervision task in DeepGate. We then introduce a novel graph neural network that uses strong inductive biases in practical circuits as learning priors for signal probability prediction. Our experimental results show the efficacy and generalization capability of DeepGate.
    BTranspose: Bottleneck Transformers for Human Pose Estimation with Self-Supervised Pre-Training. (arXiv:2204.10209v1 [cs.LG])
    The task of 2D human pose estimation is challenging as the number of keypoints is typically large (~ 17) and this necessitates the use of robust neural network architectures and training pipelines that can capture the relevant features from the input image. These features are then aggregated to make accurate heatmap predictions from which the final keypoints of human body parts can be inferred. Many papers in literature use CNN-based architectures for the backbone, and/or combine it with a transformer, after which the features are aggregated to make the final keypoint predictions [1]. In this paper, we consider the recently proposed Bottleneck Transformers [2], which combine CNN and multi-head self attention (MHSA) layers effectively, and we integrate it with a Transformer encoder and apply it to the task of 2D human pose estimation. We consider different backbone architectures and pre-train them using the DINO self-supervised learning method [3], this pre-training is found to improve the overall prediction accuracy. We call our model BTranspose, and experiments show that on the COCO validation set, our model achieves an AP of 76.4, which is competitive with other methods such as [1] and has fewer network parameters. Furthermore, we also present the dependencies of the final predicted keypoints on both the MHSA block and the Transformer encoder layers, providing clues on the image sub-regions the network attends to at the mid and high levels.
    Understanding the Domain Gap in LiDAR Object Detection Networks. (arXiv:2204.10024v1 [cs.CV])
    In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood. In this work, we investigate the domain gaps between high-resolution and low-resolution LiDAR sensors in object detection networks. Using a unique dataset, which enables us to study sensor resolution domain gaps independent of other effects, we show two distinct domain gaps - an inference domain gap and a training domain gap. The inference domain gap is characterised by a strong dependence on the number of LiDAR points per object, while the training gap shows no such dependence. These fndings show that different approaches are required to close these inference and training domain gaps.
    Towards Reliable Neural Generative Modeling of Detectors. (arXiv:2204.09947v1 [physics.ins-det])
    The increasing luminosities of future data taking at Large Hadron Collider and next generation collider experiments require an unprecedented amount of simulated events to be produced. Such large scale productions demand a significant amount of valuable computing resources. This brings a demand to use new approaches to event generation and simulation of detector responses. In this paper, we discuss the application of generative adversarial networks (GANs) to the simulation of the LHCb experiment events. We emphasize main pitfalls in the application of GANs and study the systematic effects in detail. The presented results are based on the Geant4 simulation of the LHCb Cherenkov detector.
    Conditional entropy minimization principle for learning domain invariant representation features. (arXiv:2201.10460v3 [cs.LG] UPDATED)
    Invariance principle-based methods, for example, Invariant Risk Minimization (IRM), have recently emerged as promising approaches for Domain Generalization (DG). Despite the promising theory, invariance principle-based approaches fail in common classification tasks due to the mixture of the true invariant features and the spurious invariant features. In this paper, we propose a framework based on the conditional entropy minimization principle to filter out the spurious invariant features leading to a new algorithm with a better generalization capability. We theoretically prove that under some particular assumptions, the representation function can precisely recover the true invariant features. In addition, we also show that the proposed approach is closely related to the well-known Information Bottleneck (IB) framework. Both the theoretical and numerical results are provided to justify our approach.
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.
    Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation. (arXiv:2204.08647v3 [cs.RO] UPDATED)
    For autonomous quadruped robot navigation in various complex environments, a typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller -- in a hierarchical manner. In this paper, we build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner. Previous works used waypoint-based methods (e.g. Proportional-Differential control and pure pursuit) which simplify the path tracking problem to local point-goal navigation. However, they suffer from frequent collisions in geometrically complex and narrow environments because of two reasons; the global planner uses a coarse and inaccurate model and the local planner is unable to track the global plan sufficiently well. Currently, deep learning methods are an appealing alternative because they can learn safety and path feasibility from experience more accurately. However, existing deep learning methods are not capable of planning for a long horizon. In this work, we propose a learning-based fully autonomous navigation framework composed of three innovative elements: a learned forward dynamics model (FDM), an online sampling-based model-predictive controller, and an informed trajectory sampler (ITS). Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method. Furthermore, our method can reactively handle unexpected obstacles on the planned path and avoid them. Project page https://awesomericky.github.io/projects/FDM_ITS_navigation/.
    Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. (arXiv:2204.08454v3 [cs.CV] UPDATED)
    Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires pixel-wise comparisons by a human expert. On the other hand, we often have access to unlimited unlabeled multi-temporal RS imagery thanks to ever-increasing earth observation programs. In this paper, we propose a simple yet effective way to leverage the information from unlabeled bi-temporal images to improve the performance of CD approaches. More specifically, we propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss by constraining the output change probability map of a given unlabeled bi-temporal image pair to be consistent under the small random perturbations applied on the deep feature difference map that is obtained by subtracting their latent feature representations. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD even with access to as little as 10% of the annotated training data. Code available at https://github.com/wgcban/SemiCD
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.
    Debiased Learning from Naturally Imbalanced Pseudo-Labels. (arXiv:2201.01490v2 [cs.LG] UPDATED)
    Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.
    Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation. (arXiv:2204.10020v1 [eess.AS])
    Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is challenging to learn a stable VC model because the amount of data is limited in low-resource scenarios, and highly expressive speech has large acoustic variety. To address this issue, we propose a novel data augmentation method that combines pitch-shifting and VC techniques. Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1,000 utterances of the target speaker's neutral data are available. Subjective test results showed that a FastSpeech 2-based emotional TTS system with the proposed method improved naturalness and emotional similarity compared with conventional methods.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples: An Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v5 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach that only combines machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.  ( 2 min )
    Dynamical simulation via quantum machine learning with provable generalization. (arXiv:2204.10269v1 [quant-ph])
    Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.  ( 2 min )
    Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization. (arXiv:2204.10231v1 [cs.LG])
    Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyperparameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower level determines the support vectors and the upper level the hyperparameters. This optimization problem is solved using an evolutionary algorithm (EA) at the upper level and sequential minimal optimization (SMO) at the lower level. These two methods work in a nested fashion, that is, the optimal support vectors help guide the search of the hyperparameters, and the lower level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.  ( 2 min )
    DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color Semantic Segmentation. (arXiv:2204.10266v1 [cs.LG])
    In this paper we present a new approach for feature fusion between RGB and LWIR Thermal images for the task of semantic segmentation for driving perception. We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities and a shared decoder for final segmentation. We combine two strategies for feature fusion: confidence weighting and correlation weighting. We report state-of-the-art mean IoU results on the MF dataset.  ( 2 min )
    Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications. (arXiv:2202.08916v3 [eess.IV] UPDATED)
    Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.  ( 2 min )
    Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers. (arXiv:2204.10125v1 [cs.SD])
    Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
    OCTOPUS -- optical coherence tomography plaque and stent analysis software. (arXiv:2204.10212v1 [eess.IV])
    Compared with other imaging modalities, intravascular optical coherence tomography (IVOCT) has significant advantages for guiding percutaneous coronary interventions. To aid IVOCT research studies, we developed the Optical Coherence TOmography PlaqUe and Stent (OCTOPUS) analysis software. To automate image analysis results, the software includes several important algorithmic steps: pre-processing, deep learning plaque segmentation, machine learning identification of stent struts, and registration of pullbacks. Interactive visualization and manual editing of segmentations were included in the software. Quantifications include stent deployment characteristics (e.g., stent strut malapposition), strut level analysis, calcium angle, and calcium thickness measurements. Interactive visualizations include (x,y) anatomical, en face, and longitudinal views with optional overlays. Underlying plaque segmentation algorithm yielded excellent pixel-wise results (86.2% sensitivity and 0.781 F1 score). Using OCTOPUS on 34 new pullbacks, we determined that following automated segmentation, only 13% and 23% of frames needed any manual touch up for detailed lumen and calcification labeling, respectively. Only up to 3.8% of plaque pixels were modified, leading to an average editing time of only 7.5 seconds/frame, an approximately 80% reduction compared to manual analysis. Regarding stent analysis, sensitivity and precision were both greater than 90%, and each strut was successfully classified as either covered or uncovered with high sensitivity (94%) and specificity (90%). We introduced and evaluated the clinical application of a highly automated software package, OCTOPUS, for quantitative plaque and stent analysis in IVOCT images. The software is currently used as an offline tool for research purposes; however, the software's embedded algorithms may also be useful for real-time treatment planning.  ( 2 min )
    Sketch2PQ: Freeform Planar Quadrilateral Mesh Design via a Single Sketch. (arXiv:2201.09367v3 [cs.GR] UPDATED)
    The freeform architectural modeling process often involves two important stages: concept design and digital modeling. In the first stage, architects usually sketch the overall 3D shape and the panel layout on a physical or digital paper briefly. In the second stage, a digital 3D model is created using the sketch as a reference. The digital model needs to incorporate geometric requirements for its components, such as the planarity of panels due to consideration of construction costs, which can make the modeling process more challenging. In this work, we present a novel sketch-based system to bridge the concept design and digital modeling of freeform roof-like shapes represented as planar quadrilateral (PQ) meshes. Our system allows the user to sketch the surface boundary and contour lines under axonometric projection and supports the sketching of occluded regions. In addition, the user can sketch feature lines to provide directional guidance to the PQ mesh layout. Given the 2D sketch input, we propose a deep neural network to infer in real-time the underlying surface shape along with a dense conjugate direction field, both of which are used to extract the final PQ mesh. To train and validate our network, we generate a large synthetic dataset that mimics architect sketching of freeform quadrilateral patches. The effectiveness and usability of our system are demonstrated with quantitative and qualitative evaluation as well as user studies.  ( 2 min )
    Assessing Machine Learning Algorithms for Near-Real Time Bus Ridership Prediction During Extreme Weather. (arXiv:2204.09792v1 [stat.AP])
    Given an increasingly volatile climate, the relationship between weather and transit ridership has drawn increasing interest. However, challenges stemming from spatio-temporal dependency and non-stationarity have not been fully addressed in modelling and predicting transit ridership under the influence of weather conditions especially with the traditional statistical approaches. Drawing on three-month smart card data in Brisbane, Australia, this research adopts and assesses a suite of machine-learning algorithms, i.e., random forest, eXtreme Gradient Boosting (XGBoost) and Tweedie XGBoost, to model and predict near real-time bus ridership in relation to sudden change of weather conditions. The study confirms that there indeed exists a significant level of spatio-temporal variability of weather-ridership relationship, which produces equally dynamic patterns of prediction errors. Further comparison of model performance suggests that Tweedie XGBoost outperforms the other two machine-learning algorithms in generating overall more accurate prediction outcomes in space and time. Future research may advance the current study by drawing on larger data sets and applying more advanced machine and deep-learning approaches to provide more enhanced evidence for real-time operation of transit systems.  ( 2 min )
    The 2021 NIST Speaker Recognition Evaluation. (arXiv:2204.10242v1 [eess.AS])
    The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996. It was the second large-scale multimodal speaker/person recognition evaluation organized by NIST (the first one being SRE19). Similar to SRE19, it featured two core evaluation tracks, namely audio and audio-visual, as well as an optional visual track. In addition to offering fixed and open training conditions, it also introduced new challenges for the community, thanks to a new multimodal (i.e., audio, video, and selfie images) and multilingual (i.e., with multilingual speakers) corpus, termed WeCanTalk, collected outside North America by the Linguistic Data Consortium (LDC). These challenges included: 1) trials (target and non-target) with enrollment and test segments originating from different domains (i.e., telephony versus video), and 2) trials (target and non-target) with enrollment and test segments spoken in different languages (i.e., cross-lingual trials). This paper presents an overview of SRE21 including the tasks, performance metric, data, evaluation protocol, results and system performance analyses. A total of 23 organizations (forming 15 teams) from academia and industry participated in SRE21 and submitted 158 valid system outputs. Evaluation results indicate: audio-visual fusion produce substantial gains in performance over audio-only or visual-only systems; top performing speaker and face recognition systems exhibited comparable performance under the matched domain conditions present in this evaluation; and, the use of complex neural network architectures (e.g., ResNet) along with angular losses with margin, data augmentation, as well as long duration fine-tuning contributed to notable performance improvements for the audio-only speaker recognition task.  ( 2 min )
    Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift. (arXiv:2204.08816v3 [astro-ph.GA] UPDATED)
    In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.  ( 2 min )
    Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems. (arXiv:2203.11758v2 [math.OC] UPDATED)
    Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for general continuous space-time stochastic control problems has been elusive. This paper closes the gap by proposing a proximal gradient algorithm for feedback controls of finite-time horizon stochastic control problems. The state dynamics are continuous time nonlinear diffusions with controlled drift and possibly degenerate noise, and the objectives are nonconvex in the state and nonsmooth in the control. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.  ( 2 min )
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.
    Fink: early supernovae Ia classification using active learning. (arXiv:2111.11438v2 [astro-ph.IM] UPDATED)
    We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
    Path sampling of recurrent neural networks by incorporating known physics. (arXiv:2203.00597v2 [cond-mat.dis-nn] UPDATED)
    Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.
    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
    An Improved Transfer Model: Randomized Transferable Machine. (arXiv:2011.13629v2 [cs.LG] UPDATED)
    Feature-based transfer is one of the most effective methodologies for transfer learning. Existing studies usually assume that the learned new feature representation is \emph{domain-invariant}, and thus train a transfer model $\mathcal{M}$ on the source domain. In this paper, we consider a more realistic scenario where the new feature representation is suboptimal and small divergence still exists across domains. We propose a new transfer model called Randomized Transferable Machine (RTM) to handle such small divergence of domains. Specifically, we work on the new source and target data learned from existing feature-based transfer methods. The key idea is to enlarge source training data populations by randomly corrupting the new source data using some noises, and then train a transfer model $\widetilde{\mathcal{M}}$ that performs well on all the corrupted source data populations. In principle, the more corruptions are made, the higher the probability of the new target data can be covered by the constructed source data populations, and thus better transfer performance can be achieved by $\widetilde{\mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We develop a marginalized solution that enables to train an $\widetilde{\mathcal{M}}$ without conducting any corruption but equivalent to be trained using infinite source noisy data populations. We further propose two instantiations of $\widetilde{\mathcal{M}}$, which theoretically show the transfer superiority over the conventional transfer model $\mathcal{M}$. More importantly, both instantiations have closed-form solutions, leading to a fast and efficient training process. Experiments on various real-world transfer tasks show that RTM is a promising transfer model.
    Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective. (arXiv:2103.03113v3 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
    Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface. (arXiv:2107.06393v2 [cs.CV] UPDATED)
    Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Anti-Jamming Games in Multi-Band Wireless Ad Hoc Networks. (arXiv:2111.11178v2 [cs.IT] UPDATED)
    For multi-band wireless ad hoc networks of multiple users, an anti-jamming game between the users and a jammer is studied. In this game, the users (resp. jammer) want to maximize (resp. minimize) the expected rewards of the users taking into account various factors such as communication rate, hopping cost, and jamming loss. We analyze the arms race of the game and derive an optimal frequency hopping policy at each stage of the arms race based on the Markov decision process (MDP). It is analytically shown that the arms race reaches an equilibrium after a few rounds, and a frequency hopping policy and a jamming strategy at the equilibrium are characterized. We propose two kinds of collision avoidance protocols to ensure that at most one user communicates in each frequency band, and provide various numerical results that show the effects of the reward parameters and collision avoidance protocols on the optimal frequency hopping policy and the expected rewards at the equilibrium. Moreover, we discuss about equilibria for the case where the jammer adopts some unpredictable jamming strategies.
    Distributed Learning for Vehicular Dynamic Spectrum Access in Autonomous Driving. (arXiv:2204.10179v1 [cs.NI])
    Reliable wireless communication between the autonomously driving cars is one of the fundamental needs for guaranteeing passenger safety and comfort. However, when the number of communicating cars increases, the transmission quality may be significantly degraded due to too high occupancy radio of the used frequency band. In this paper, we concentrate on the autonomous vehicle-platooning use-case, where intra-platoon communication is done in the dynamically selected frequency band, other than nominally devoted for such purposes. The carrier selection is done in a flexible manner with the support of the context database located at the roadside unit (edge of wireless communication infrastructure). However, as the database delivers only context information to the platoons' leaders, the final decision is made separately by the individual platoons, following the suggestions made by the artificial intelligence algorithms. In this work, we concentrate on a lightweight Q-learning solution, that could be successfully implemented in each car for dynamic channel selection.
    Merging of neural networks. (arXiv:2204.09973v1 [cs.LG])
    We propose a simple scheme for merging two neural networks trained with different starting initialization into a single one with the same size as the original ones. We do this by carefully selecting channels from each input network. Our procedure might be used as a finalization step after one tries multiple starting seeds to avoid an unlucky one. We also show that training two networks and merging them leads to better performance than training a single network for an extended period of time. Availability: https://github.com/fmfi-compbio/neural-network-merging
    OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks. (arXiv:2202.13799v2 [cs.CV] UPDATED)
    We propose OUR-GAN, the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image. OUR-GAN generates a visually coherent image at low resolution and then gradually increases the resolution by super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that generate large images based on the patch distribution learned from relatively small images. OUR-GAN applies seamless subregion-wise super-resolution that synthesizes 4k or higher UHR images with limited memory, preventing discontinuity at the boundary. Additionally, OUR-GAN improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with existing methods. The synthesized images are presented at https://anonymous-62348.github.io.
    A System for Interactive Examination of Learned Security Policies. (arXiv:2204.01126v2 [cs.CR] UPDATED)
    We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
    INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs. (arXiv:2204.10184v1 [cs.NI])
    WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.
    BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern Analysis. (arXiv:2204.05746v2 [cs.CR] UPDATED)
    Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models on our proposed dataset are between 93.24% and 96.71%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation).
    How Well Do Sparse Imagenet Models Transfer?. (arXiv:2111.13445v5 [cs.CV] UPDATED)
    Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned - that is, compressed by sparsifying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Energy-Efficient Parking Analytics System using Deep Reinforcement Learning. (arXiv:2202.08973v2 [cs.CV] UPDATED)
    Advances in deep vision techniques and ubiquity of smart cameras will drive the next generation of video analytics. However, video analytics applications consume vast amounts of energy as both deep learning techniques and cameras are power-hungry. In this paper, we focus on a parking video analytics platform and propose RL-CamSleep, a deep reinforcement learning-based technique, to actuate the cameras to reduce the energy footprint while retaining the system's utility. Our key insight is that many video-analytics applications do not always need to be operational, and we can design policies to activate video analytics only when necessary. Moreover, our work is complementary to existing work that focuses on improving hardware and software efficiency. We evaluate our approach on a city-scale parking dataset having 76 streets spread across the city. Our analysis demonstrates how streets have various parking patterns, highlighting the importance of an adaptive policy. Our approach can learn such an adaptive policy that can reduce the average energy consumption by 76.38% and achieve an average accuracy of more than 98% in performing video analytics.
    The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. (arXiv:2110.07732v3 [cs.LG] UPDATED)
    Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
    On the Certified Robustness for Ensemble Models and Beyond. (arXiv:2107.10873v2 [cs.LG] UPDATED)
    Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.  ( 2 min )
    A Survey and Perspective on Artificial Intelligence for Security-Aware Electronic Design Automation. (arXiv:2204.09579v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) and machine learning (ML) techniques have been increasingly used in several fields to improve performance and the level of automation. In recent years, this use has exponentially increased due to the advancement of high-performance computing and the ever increasing size of data. One of such fields is that of hardware design; specifically the design of digital and analog integrated circuits~(ICs), where AI/ ML techniques have been extensively used to address ever-increasing design complexity, aggressive time-to-market, and the growing number of ubiquitous interconnected devices (IoT). However, the security concerns and issues related to IC design have been highly overlooked. In this paper, we summarize the state-of-the-art in AL/ML for circuit design/optimization, security and engineering challenges, research in security-aware CAD/EDA, and future research directions and needs for using AI/ML for security-aware circuit design.  ( 2 min )
    ESS: Learning Event-based Semantic Segmentation from Still Images. (arXiv:2203.10016v1 [cs.CV] CROSS LISTED)
    Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the novelty of the sensor, and the lack of high-quality, labeled datasets. In this work, we introduce ESS, which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, to spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.  ( 2 min )
    Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism. (arXiv:1811.02117v2 [cs.SI] UPDATED)
    An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in a wide range of domains. Here we propose a deep learning attention mechanism to model the process through which individual items gain their popularity. We analyze the interpretability of the model with the four key phenomena confirmed independently in the previous studies of long-term popularity dynamics quantification, including the intrinsic quality, the aging effect, the recency effect and the Matthew effect. We analyze the effectiveness of introducing attention model in popularity dynamics prediction. Extensive experiments on a real-large citation data set demonstrate that the designed deep learning attention mechanism possesses remarkable power at predicting the long-term popularity dynamics. It consistently outperforms the existing methods, and achieves a significant performance improvement.  ( 2 min )
    Deep Bayesian Active Learning, A Brief Survey on Recent Advances. (arXiv:2012.08044v2 [cs.LG] UPDATED)
    Active learning frameworks offer efficient data annotation without remarkable accuracy degradation. In other words, active learning starts training the model with a small size of labeled data while exploring the space of unlabeled data in order to select most informative samples to be labeled. Generally speaking, representing the uncertainty is crucial in any active learning framework, however, deep learning methods are not capable of either representing or manipulating model uncertainty. On the other hand, from the real world application perspective, uncertainty representation is getting more and more attention in the machine learning community. Deep Bayesian active learning frameworks and generally any Bayesian active learning settings, provide practical consideration in the model which allows training with small data while representing the model uncertainty for further efficient training. In this paper, we briefly survey recent advances in Bayesian active learning and in particular deep Bayesian active learning frameworks.
    Addressing Tactic Volatility in Self-Adaptive Systems Using Evolved Recurrent Neural Networks and Uncertainty Reduction Tactics. (arXiv:2204.10308v1 [cs.LG])
    Self-adaptive systems frequently use tactics to perform adaptations. Tactic examples include the implementation of additional security measures when an intrusion is detected, or activating a cooling mechanism when temperature thresholds are surpassed. Tactic volatility occurs in real-world systems and is defined as variable behavior in the attributes of a tactic, such as its latency or cost. A system's inability to effectively account for tactic volatility adversely impacts its efficiency and resiliency against the dynamics of real-world environments. To enable systems' efficiency against tactic volatility, we propose a Tactic Volatility Aware (TVA-E) process utilizing evolved Recurrent Neural Networks (eRNN) to provide accurate tactic predictions. TVA-E is also the first known process to take advantage of uncertainty reduction tactics to provide additional information to the decision-making process and reduce uncertainty. TVA-E easily integrates into popular adaptation processes enabling it to immediately benefit a large number of existing self-adaptive systems. Simulations using 52,106 tactic records demonstrate that: I) eRNN is an effective prediction mechanism, II) TVA-E represents an improvement over existing state-of-the-art processes in accounting for tactic volatility, and III) Uncertainty reduction tactics are beneficial in accounting for tactic volatility. The developed dataset and tool can be found at https://tacticvolatility.github.io/
    Lessons on Parameter Sharing across Layers in Transformers. (arXiv:2104.06022v3 [cs.CL] UPDATED)
    We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.  ( 2 min )
    DropMessage: Unifying Random Dropping for Graph Neural Networks. (arXiv:2204.10037v1 [cs.LG])
    Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also faces some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate noises into models by randomly masking parts of the input. However, some open-ended problems of random dropping on GNNs remain to solve. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, random noises introduced to GNNs cause the incomplete coverage of parameters and unstable training process. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the message matrix and can be applied to any message-passing GNNs. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, which makes it a theoretical upper bound of other methods. Also, we unify existing random dropping methods into our framework and analyze their effects on GNNs. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has both advantages of effectiveness and generalization.  ( 2 min )
    STONet: A Neural-Operator-Driven Spatio-temporal Network. (arXiv:2204.08414v2 [cs.LG] UPDATED)
    Graph-based spatio-temporal neural networks are effective to model the spatial dependency among discrete points sampled irregularly from unstructured grids, thanks to the great expressiveness of graph neural networks. However, these models are usually spatially-transductive -- only fitting the signals for discrete spatial nodes fed in models but unable to generalize to `unseen' spatial points with zero-shot. In comparison, for forecasting tasks on continuous space such as temperature prediction on the earth's surface, the \textit{spatially-inductive} property allows the model to generalize to any point in the spatial domain, demonstrating models' ability to learn the underlying mechanisms or physics laws of the systems, rather than simply fit the signals. Besides, in temporal domains, \textit{irregularly-sampled} time series, e.g. data with missing values, urge models to be temporally-continuous. Motivated by the two issues, we propose a spatio-temporal framework based on neural operators for PDEs, which learn the underlying mechanisms governing the dynamics of spatially-continuous physical quantities. Experiments show our model's improved performance on forecasting spatially-continuous physic quantities, and its superior generalization to unseen spatial points and ability to handle temporally-irregular data.  ( 2 min )
    MRAM-based Analog Sigmoid Function for In-memory Computing. (arXiv:2204.09918v1 [cs.ET])
    We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed analog neuron circuit consumes 1.8-27x less power, and occupies 2.5-4931x smaller area, compared to the state-of-the-art analog and digital implementations. Moreover, the developed neuron can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. The architecture-level analyses show that a fully-analog in-memory computing (IMC) circuit that use our SOT-MRAM neuron along with an SOT-MRAM based crossbar can achieve more than 1.1x, 12x, and 13.3x reduction in power, latency, and energy, respectively, compared to a mixed-signal implementation with analog memristive crossbars and digital neurons. Finally, through cross-layer analyses, we provide a guide on how varying the device-level parameters in our neuron can affect the accuracy of multilayer perceptron (MLP) for MNIST classification.  ( 2 min )
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.  ( 2 min )
    NetSentry: A Deep Learning Approach to Detecting Incipient Large-scale Network Attacks. (arXiv:2202.09873v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques are increasingly adopted to tackle ever-evolving high-profile network attacks, including DDoS, botnet, and ransomware, due to their unique ability to extract complex patterns hidden in data streams. These approaches are however routinely validated with data collected in the same environment, and their performance degrades when deployed in different network topologies and/or applied on previously unseen traffic, as we uncover. This suggests malicious/benign behaviors are largely learned superficially and ML-based Network Intrusion Detection System (NIDS) need revisiting, to be effective in practice. In this paper we dive into the mechanics of large-scale network attacks, with a view to understanding how to use ML for Network Intrusion Detection (NID) in a principled way. We reveal that, although cyberattacks vary significantly in terms of payloads, vectors and targets, their early stages, which are critical to successful attack outcomes, share many similarities and exhibit important temporal correlations. Therefore, we treat NID as a time-sensitive task and propose NetSentry, perhaps the first of its kind NIDS that builds on Bidirectional Asymmetric LSTM (Bi-ALSTM), an original ensemble of sequential neural models, to detect network threats before they spread. We cross-evaluate NetSentry using two practical datasets, training on one and testing on the other, and demonstrate F1 score gains above 33% over the state-of-the-art, as well as up to 3 times higher rates of detecting attacks such as XSS and web bruteforce. Further, we put forward a novel data augmentation technique that boosts the generalization abilities of a broad range of supervised deep learning algorithms, leading to average F1 score gains above 35%.  ( 2 min )
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.  ( 2 min )
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.  ( 2 min )
    CNLL: A Semi-supervised Approach For Continual Noisy Label Learning. (arXiv:2204.09881v1 [cs.CV])
    The task of continual learning requires careful design of algorithms that can tackle catastrophic forgetting. However, the noisy label, which is inevitable in a real-world scenario, seems to exacerbate the situation. While very few studies have addressed the issue of continual learning under noisy labels, long training time and complicated training schemes limit their applications in most cases. In contrast, we propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate. After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples. Training in this fashion helps us learn a better representation that results in state-of-the-art (SOTA) performance. Through extensive experimentation on 3 benchmark datasets, MNIST, CIFAR10 and CIFAR100, we show the effectiveness of our proposed approach. We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods. Our code is publicly available.  ( 2 min )
    Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector. (arXiv:2104.08044v11 [cs.CR] UPDATED)
    Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.  ( 3 min )
    Neural Topic Modeling of Psychotherapy Sessions. (arXiv:2204.10189v1 [cs.CL])
    In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the psychotherapy effectiveness.  ( 2 min )
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.  ( 2 min )
    Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports. (arXiv:2009.05094v5 [cs.LG] UPDATED)
    Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification.  ( 3 min )
    Learnable Model Augmentation Self-Supervised Learning for Sequential Recommendation. (arXiv:2204.10128v1 [cs.IR])
    Sequential Recommendation aims to predict the next item based on user behaviour. Recently, Self-Supervised Learning (SSL) has been proposed to improve recommendation performance. However, most of existing SSL methods use a uniform data augmentation scheme, which loses the sequence correlation of an original sequence. To this end, in this paper, we propose a Learnable Model Augmentation self-supervised learning for sequential Recommendation (LMA4Rec). Specifically, LMA4Rec first takes model augmentation as a supplementary method for data augmentation to generate views. Then, LMA4Rec uses learnable Bernoulli dropout to implement model augmentation learnable operations. Next, self-supervised learning is used between the contrastive views to extract self-supervised signals from an original sequence. Finally, experiments on three public datasets show that the LMA4Rec method effectively improves sequential recommendation performance compared with baseline methods.  ( 2 min )
    TorchSparse: Efficient Point Cloud Inference Engine. (arXiv:2204.10319v1 [cs.LG])
    Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.  ( 2 min )
    Learning Future Object Prediction with a Spatiotemporal Detection Transformer. (arXiv:2204.10321v1 [cs.CV])
    We explore future object prediction -- a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware. (arXiv:2204.10183v1 [cs.LG])
    The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.  ( 2 min )
    A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms. (arXiv:2204.10233v1 [cs.LG])
    Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite of algorithmic "fairness-enhancing" remedies. Most existing algorithms, however, are agnostic to the sources of the observed unfairness. As a result, the literature currently lacks guiding frameworks to specify conditions under which each algorithmic intervention can potentially alleviate the underpinning cause of unfairness. To close this gap, we scrutinize the underlying biases (e.g., in the training data or design choices) that cause observational unfairness. We present a bias-injection sandbox tool to investigate fairness consequences of various biases and assess the effectiveness of algorithmic remedies in the presence of specific types of bias. We call this process the bias(stress)-testing of algorithmic interventions. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention in the biased setting with true labels in the unbiased regime -- that is, before any bias injection. We illustrate the utility of our toolkit via a proof-of-concept case study on synthetic data. Our empirical analysis showcases the type of insights that can be obtained through our simulations.  ( 2 min )
    Feature anomaly detection system (FADS) for intelligent manufacturing. (arXiv:2204.10318v1 [cs.CV])
    Anomaly detection is important for industrial automation and part quality assurance, and while humans can easily detect anomalies in components given a few examples, designing a generic automated system that can perform at human or above human capabilities remains a challenge. In this work, we present a simple new anomaly detection algorithm called FADS (feature-based anomaly detection system) which leverages pretrained convolutional neural networks (CNN) to generate a statistical model of nominal inputs by observing the activation of the convolutional filters. During inference the system compares the convolutional filter activation of the new input to the statistical model and flags activations that are outside the expected range of values and therefore likely an anomaly. By using a pretrained network, FADS demonstrates excellent performance similar to or better than other machine learning approaches to anomaly detection while at the same time FADS requires no tuning of the CNN weights. We demonstrate FADS ability by detecting process parameter changes on a custom dataset of additively manufactured lattices. The FADS localization algorithm shows that textural differences that are visible on the surface can be used to detect process parameter changes. In addition, we test FADS on benchmark datasets, such as the MVTec Anomaly Detection dataset, and report good results.  ( 2 min )
    A two-level machine learning framework for predictive maintenance: comparison of learning formulations. (arXiv:2204.10083v1 [cs.LG])
    Predicting incoming failures and scheduling maintenance based on sensors information in industrial machines is increasingly important to avoid downtime and machine failure. Different machine learning formulations can be used to solve the predictive maintenance problem. However, many of the approaches studied in the literature are not directly applicable to real-life scenarios. Indeed, many of those approaches usually either rely on labelled machine malfunctions in the case of classification and fault detection, or rely on finding a monotonic health indicator on which a prediction can be made in the case of regression and remaining useful life estimation, which is not always feasible. Moreover, the decision-making part of the problem is not always studied in conjunction with the prediction phase. This paper aims to design and compare different formulations for predictive maintenance in a two-level framework and design metrics that quantify both the failure detection performance as well as the timing of the maintenance decision. The first level is responsible for building a health indicator by aggregating features using a learning algorithm. The second level consists of a decision-making system that can trigger an alarm based on this health indicator. Three degrees of refinements are compared in the first level of the framework, from simple threshold-based univariate predictive technique to supervised learning methods based on the remaining time before failure. We choose to use the Support Vector Machine (SVM) and its variations as the common algorithm used in all the formulations. We apply and compare the different strategies on a real-world rotating machine case study and observe that while a simple model can already perform well, more sophisticated refinements enhance the predictions for well-chosen parameters.  ( 2 min )
    Learning spatiotemporal features from incomplete data for traffic flow prediction using hybrid deep neural networks. (arXiv:2204.10222v1 [cs.LG])
    Urban traffic flow prediction using data-driven models can play an important role in route planning and preventing congestion on highways. These methods utilize data collected from traffic recording stations at different timestamps to predict the future status of traffic. Hence, data collection, transmission, storage, and extraction techniques can have a significant impact on the performance of the traffic flow model. On the other hand, a comprehensive database can provide the opportunity for using complex, yet reliable predictive models such as deep learning methods. However, most of these methods have difficulties in handling missing values and outliers. This study focuses on hybrid deep neural networks to predict traffic flow in the California Freeway Performance Measurement System (PeMS) with missing values. The proposed networks are based on a combination of recurrent neural networks (RNNs) to consider the temporal dependencies in the data recorded in each station and convolutional neural networks (CNNs) to take the spatial correlations in the adjacent stations into account. Various architecture configurations with series and parallel connections are considered based on RNNs and CNNs, and several prevalent data imputation techniques are used to examine the robustness of the hybrid networks to missing values. A comprehensive analysis performed on two different datasets from PeMS indicates that the proposed series-parallel hybrid network with the mean imputation technique achieves the lowest error in predicting the traffic flow and is robust to missing values up until 21% missing ratio in both complete and incomplete training data scenarios when applied to an incomplete test data.  ( 2 min )
    Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries. (arXiv:2204.10162v1 [cs.LG])
    Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Revisiting Gaussian mixture critic in off-policy reinforcement learning: a sample-based approach. (arXiv:2204.10256v1 [cs.LG])
    Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value estimation.One major drawback of the C51 approach is its requirement of prior knowledge about the minimum andmaximum values a policy can attain as well as the number of bins used, which fixes the resolution ofthe distributional estimate. While the DeepMind control suite of tasks utilizes standardized rewards and episode lengths, thus enabling the entire suite to be solved with a single setting of these hyperparameters, this is often not the case. This paper revisits a natural alternative that removes this requirement, namelya mixture of Gaussians, and a simple sample-based loss function to train it in an off-policy regime. We empirically evaluate its performance on a broad range of continuous control tasks and demonstrate that it eliminates the need for these distributional hyperparameters and achieves state-of-the-art performance on a variety of challenging tasks (e.g. the humanoid, dog, quadruped, and manipulator domains). Finallywe provide an implementation in the Acme agent repository.  ( 2 min )
    IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages. (arXiv:2204.10195v1 [cs.CL])
    In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.  ( 2 min )
    Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge. (arXiv:2204.10202v1 [cs.CL])
    Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.  ( 2 min )
    Is Neuron Coverage Needed to Make Person Detection More Robust?. (arXiv:2204.10027v1 [cs.CV])
    The growing use of deep neural networks (DNNs) in safety- and security-critical areas like autonomous driving raises the need for their systematic testing. Coverage-guided testing (CGT) is an approach that applies mutation or fuzzing according to a predefined coverage metric to find inputs that cause misbehavior. With the introduction of a neuron coverage metric, CGT has also recently been applied to DNNs. In this work, we apply CGT to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding DNN bugs via sampling and mutation, and subsequent DNN retraining on the updated training set. To be a bug, we require a mutated image to cause a significant performance drop compared to a clean input. In accordance with the CGT, we also consider an additional requirement of increased coverage in the bug definition. In order to explore several types of robustness, our approach includes natural image transformations, corruptions, and adversarial examples generated with the Daedalus attack. The proposed framework has uncovered several thousand cases of incorrect DNN behavior. The relative change in mAP performance of the retrained models reached on average between 26.21\% and 64.24\% for different robustness types. However, we have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.  ( 2 min )
    Evolution and use of data science vocabulary. How much have we changed in 13 years?. (arXiv:2204.10174v1 [cs.DL])
    Here I present an investigation on the evolution and use of vocabulary in data science in the last 13 years. Based on a rigorous statistical analysis, a database with 12,787 documents containing the words "data science" in the title, abstract or keywords is analyzed. It is proposed to classify the evolution of this discipline in three periods: emergence, growth and boom. Characteristic words and pioneering documents are identified for each period. By proposing the distinctive vocabulary and relevant topics of data science and classified in time periods, these results add value to the scientific community of this discipline.  ( 2 min )
    Detecting Topology Attacks against Graph Neural Networks. (arXiv:2204.10072v1 [cs.LG])
    Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.  ( 2 min )
    Working memory inspired hierarchical video decomposition with transformative representations. (arXiv:2204.10105v1 [cs.CV])
    Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises still exist in video decomposition. To solve these problems, this study is the first to introduce a flexible visual working memory model in video decomposition tasks to provide interpretable and high-performance hierarchical deep architecture, integrating the transformative representations between sensory and control layers from the perspective of visual and cognitive neuroscience. Specifically, robust PCA unrolling networks acting as a structure-regularized sensor layer decompose XCA into sparse/low-rank structured representations to separate moving contrast-filled vessels from noisy and complex backgrounds. Then, patch recurrent convolutional LSTM networks with a backprojection module embody unstructured random representations of the control layer in working memory, recurrently projecting spatiotemporally decomposed nonlocal patches into orthogonal subspaces for heterogeneous vessel retrieval and interference suppression. This video decomposition deep architecture effectively restores the heterogeneous profiles of intensity and the geometries of moving objects against the complex background interferences. Experiments show that the proposed method significantly outperforms state-of-the-art methods in accurate moving contrast-filled vessel extraction with excellent flexibility and computational efficiency.  ( 2 min )
    Robustness of Machine Learning Models Beyond Adversarial Attacks. (arXiv:2204.10046v1 [cs.LG])
    Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.  ( 2 min )
    Fluctuation-based Outlier Detection. (arXiv:2204.10007v1 [cs.LG])
    Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with seven state-of-the-art algorithms on eight real-world tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection.  ( 2 min )
    Multi-Tier Platform for Cognizing Massive Electroencephalogram. (arXiv:2204.09840v1 [eess.SP])
    An end-to-end platform assembling multiple tiers is built for precisely cognizing brain activities. Being fed massive electroencephalogram (EEG) data, the time-frequency spectrograms are conventionally projected into the episode-wise feature matrices (seen as tier-1). A spiking neural network (SNN) based tier is designed to distill the principle information in terms of spike-streams from the rare features, which maintains the temporal implication in the nature of EEGs. The proposed tier-3 transposes time- and space-domain of spike patterns from the SNN; and feeds the transposed pattern-matrices into an artificial neural network (ANN, Transformer specifically) known as tier-4, where a special spanning topology is proposed to match the two-dimensional input form. In this manner, cognition such as classification is conducted with high accuracy. For proof-of-concept, the sleep stage scoring problem is demonstrated by introducing multiple EEG datasets with the largest comprising 42,560 hours recorded from 5,793 subjects. From experiment results, our platform achieves the general cognition overall accuracy of 87% by leveraging sole EEG, which is 2% superior to the state-of-the-art. Moreover, our developed multi-tier methodology offers visible and graphical interpretations of the temporal characteristics of EEG by identifying the critical episodes, which is demanded in neurodynamics but hardly appears in conventional cognition scenarios.  ( 2 min )
    A data filling methodology for time series based on CNN and (Bi)LSTM neural networks. (arXiv:2204.09994v1 [cs.LG])
    In the process of collecting data from sensors, several circumstances can affect their continuity and validity, resulting in alterations of the data or loss of information. Although classical methods of statistics, such as interpolation-like techniques, can be used to approximate the missing data in a time series, the recent developments in Deep Learning (DL) have given impetus to innovative and much more accurate forecasting techniques. In the present paper, we develop two DL models aimed at filling data gaps, for the specific case of internal temperature time series obtained from monitored apartments located in Bolzano, Italy. The DL models developed in the present work are based on the combination of Convolutional Neural Networks (CNNs), Long Short-Term Memory Neural Networks (LSTMs), and Bidirectional LSTMs (BiLSTMs). Two key features of our models are the use of both pre- and post-gap data, and the exploitation of a correlated time series (the external temperature) in order to predict the target one (the internal temperature). Our approach manages to capture the fluctuating nature of the data and shows good accuracy in reconstructing the target time series. In addition, our models significantly improve the already good results from another DL architecture that is used as a baseline for the present work.  ( 2 min )
    A Learned Index for Exact Similarity Search in Metric Spaces. (arXiv:2204.10028v1 [cs.DB])
    Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index has been explored actively to replace or supplement traditional index structures with machine learning models to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, a novel indexing approach called LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes for efficient similarity query processing in metric spaces. The underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on the disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.  ( 2 min )
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.  ( 2 min )
    Deep transfer learning for partial differential equations under conditional shift with DeepONet. (arXiv:2204.09810v1 [cs.LG])
    Traditional machine learning algorithms are designed to learn in isolation, i.e. address single tasks. The core idea of transfer learning (TL) is that knowledge gained in learning to perform one task (source) can be leveraged to improve learning performance in a related, but different, task (target). TL leverages and transfers previously acquired knowledge to address the expense of data acquisition and labeling, potential computational power limitations, and the dataset distribution mismatches. Although significant progress has been made in the fields of image processing, speech recognition, and natural language processing (for classification and regression) for TL, little work has been done in the field of scientific machine learning for functional regression and uncertainty quantification in partial differential equations. In this work, we propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet). Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain by embedding conditional distributions onto a reproducing kernel Hilbert space. Task-specific operator learning is accomplished by fine-tuning task-specific layers of the target DeepONet using a hybrid loss function that allows for the matching of individual target samples while also preserving the global properties of the conditional distribution of target data. We demonstrate the advantages of our approach for various TL scenarios involving nonlinear PDEs under conditional shift. Our results include geometry domain adaptation and show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.  ( 2 min )
    Fairness in Graph Mining: A Survey. (arXiv:2204.09888v1 [cs.LG])
    Graph mining algorithms have been playing a significant role in myriad fields over the years. However, despite their promising performance on various graph analytical tasks, most of these algorithms lack fairness considerations. As a consequence, they could lead to discrimination towards certain populations when exploited in human-centered applications. Recently, algorithmic fairness has been extensively studied in graph-based applications. In contrast to algorithmic fairness on independent and identically distributed (i.i.d.) data, fairness in graph mining has exclusive backgrounds, taxonomies, and fulfilling techniques. In this survey, we provide a comprehensive and up-to-date introduction of existing literature under the context of fair graph mining. Specifically, we propose a novel taxonomy of fairness notions on graphs, which sheds light on their connections and differences. We further present an organized summary of existing techniques that promote fairness in graph mining. Finally, we summarize the widely used datasets in this emerging research field and provide insights on current research challenges and open questions, aiming at encouraging cross-breeding ideas and further advances.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )
    Perception Visualization: Seeing Through the Eyes of a DNN. (arXiv:2204.09920v1 [cs.CV])
    Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by developing various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.  ( 2 min )
    GUARD: Graph Universal Adversarial Defense. (arXiv:2204.09803v1 [cs.LG])
    Recently, graph convolutional networks (GCNs) have shown to be vulnerable to small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such a threat, considerable research efforts have been devoted to increasing the robustness of GCNs against adversarial attacks. However, current approaches for defense are typically designed for the whole graph and consider the global performance, posing challenges in protecting important local nodes from stronger adversarial targeted attacks. In this work, we present a simple yet effective method, named \textbf{\underline{G}}raph \textbf{\underline{U}}niversal \textbf{\underline{A}}dve\textbf{\underline{R}}sarial \textbf{\underline{D}}efense (GUARD). Unlike previous works, GUARD protects each individual node from attacks with a universal defensive patch, which is generated once and can be applied to any node (node-agnostic) in a graph. Extensive experiments on four benchmark datasets demonstrate that our method significantly improves robustness for several established GCNs against multiple adversarial attacks and outperforms existing adversarial defense methods by large margins. Our code is publicly available at https://github.com/EdisonLeeeee/GUARD.  ( 2 min )
    MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering. (arXiv:2204.10011v1 [cs.LG])
    In healthcare prediction tasks, it is essential to exploit the correlations between medical features and learn better patient health representations. Existing methods try to estimate feature correlations only from data, or increase the quality of estimation by introducing task-specific medical knowledge. However, such methods either are difficult to estimate the feature correlations due to insufficient training samples, or cannot be generalized to other tasks due to reliance on specific knowledge. There are medical research revealing that not all the medical features are strongly correlated. Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality. In this paper, we propose a general patient health representation learning framework MedFACT. We estimate correlations via measuring similarity between temporal patterns of medical features with kernel methods, and cluster features with strong correlations into groups. The feature group is further formulated as a correlation graph, and we employ graph convolutional networks to conduct group-wise feature interactions for better representation learning. Experiments on two real-world datasets demonstrate the superiority of MedFACT. The discovered medical findings are also confirmed by literature, providing valuable medical insights and explanations.  ( 2 min )
    fairDMS: Rapid Model Training by Data and Model Reuse. (arXiv:2204.09805v1 [cs.LG])
    Extracting actionable information from data sources such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming more challenging due to the fast-growing data generation rate. The rapid analysis possible with ML methods can enable fast feedback loops that can be used to adjust experimental setups in real-time, for example when errors occur or interesting events are detected. However, to avoid degradation in ML performance over time due to changes in an instrument or sample, we need a way to update ML models rapidly while an experiment is running. We present here a data service and model service to accelerate deep neural network training with a focus on ML-based scientific applications. Our proposed data service achieves 100x speedup in terms of data labeling compare to the current state-of-the-art. Further, our model service achieves up to 200x improvement in training speed. Overall, fairDMS achieves up to 92x speedup in terms of end-to-end model updating time.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.  ( 2 min )
    Scaling Language Model Size in Cross-Device Federated Learning. (arXiv:2204.09715v1 [cs.CL])
    Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.  ( 2 min )
    A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines. (arXiv:2204.09772v1 [cs.AI])
    A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.  ( 2 min )
    Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach. (arXiv:2204.09746v1 [cs.LG])
    The limited communication resources, e.g., bandwidth and energy, and data heterogeneity across devices are two of the main bottlenecks for federated learning (FL). To tackle these challenges, we first devise a novel FL framework with partial model aggregation (PMA), which only aggregates the lower layers of neural networks responsible for feature extraction while the upper layers corresponding to complex pattern recognition remain at devices for personalization. The proposed PMA-FL is able to address the data heterogeneity and reduce the transmitted information in wireless channels. We then obtain a convergence bound of the framework under a non-convex loss function setting. With the aid of this bound, we define a new objective function, named the scheduled data sample volume, to transfer the original inexplicit optimization problem into a tractable one for device scheduling, bandwidth allocation, computation and communication time division. Our analysis reveals that the optimal time division is achieved when the communication and computation parts of PMA-FL have the same power. We also develop a bisection method to solve the optimal bandwidth allocation policy and use the set expansion algorithm to address the optimal device scheduling. Compared with the state-of-the-art benchmarks, the proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets, i.e., MINIST and CIFAR-10, respectively. In addition, the proposed joint dynamic device scheduling and resource optimization approach achieve slightly higher accuracy than the considered benchmarks, but they provide a satisfactory energy and time reduction: 29% energy or 20% time reduction on the MNIST; and 25% energy or 12.5% time reduction on the CIFAR-10.  ( 2 min )
    Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation. (arXiv:2204.09801v1 [cs.LG])
    In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.  ( 2 min )
    Matching Writers to Content Writing Tasks. (arXiv:2204.09718v1 [cs.CL])
    Businesses need content. In various forms and formats and for varied purposes. In fact, the content marketing industry is set to be worth $412.88 billion by the end of 2021. However, according to the Content Marketing Institute, creating engaging content is the #1 challenge that marketers face today. We under-stand that producing great content requires great writers who understand the business and can weave their message into reader (and search engine) friendly content. In this project, the team has attempted to bridge the gap between writers and projects by using AI and ML tools. We used NLP techniques to analyze thou-sands of publicly available business articles (corpora) to extract various defining factors for each writing sample. Through this project we aim to automate the highly time-consuming, and often biased task of manually shortlisting the most suitable writer for a given content writing requirement. We believe that a tool like this will have far reaching positive implications for both parties - businesses looking for suitable talent for niche writing jobs as well as experienced writers and Subject Matter Experts (SMEs) wanting to lend their services to content marketing projects. The business gets the content they need, the content writer/ SME gets a chance to leverage his or her talent, while the reader gets authentic content that adds real value.  ( 2 min )
    Generative Pre-Trained Transformers for Biologically Inspired Design. (arXiv:2204.09714v1 [cs.CL])
    Biological systems in nature have evolved for millions of years to adapt and survive the environment. Many features they developed can be inspirational and beneficial for solving technical problems in modern industries. This leads to a novel form of design-by-analogy called bio-inspired design (BID). Although BID as a design method has been proven beneficial, the gap between biology and engineering continuously hinders designers from effectively applying the method. Therefore, we explore the recent advance of artificial intelligence (AI) for a computational approach to bridge the gap. This paper proposes a generative design approach based on the pre-trained language model (PLM) to automatically retrieve and map biological analogy and generate BID in the form of natural language. The latest generative pre-trained transformer, namely GPT-3, is used as the base PLM. Three types of design concept generators are identified and fine-tuned from the PLM according to the looseness of the problem space representation. Machine evaluators are also fine-tuned to assess the correlation between the domains within the generated BID concepts. The approach is then tested via a case study in which the fine-tuned models are applied to generate and evaluate light-weighted flying car concepts inspired by nature. The results show our approach can generate BID concepts with good performance.  ( 2 min )
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.  ( 2 min )
    FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow. (arXiv:2204.09679v1 [cs.CV])
    Super-resolution suffers from an innate ill-posed problem that a single low-resolution (LR) image can be from multiple high-resolution (HR) images. Recent studies on the flow-based algorithm solve this ill-posedness by learning the super-resolution space and predicting diverse HR outputs. Unfortunately, the diversity of the super-resolution outputs is still unsatisfactory, and the outputs from the flow-based model usually suffer from undesired artifacts which causes low-quality outputs. In this paper, we propose FS-NCSR which produces diverse and high-quality super-resolution outputs using frequency separation and noise conditioning compared to the existing flow-based approaches. As the sharpness and high-quality detail of the image rely on its high-frequency information, FS-NCSR only estimates the high-frequency information of the high-resolution outputs without redundant low-frequency components. Through this, FS-NCSR significantly improves the diversity score without significant image quality degradation compared to the NCSR, the winner of the previous NTIRE 2021 challenge.  ( 2 min )
  • Open

    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.  ( 2 min )
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.  ( 2 min )
    Strong posterior contraction rates via Wasserstein dynamics. (arXiv:2203.10754v2 [math.ST] UPDATED)
    In this paper, we develop a novel approach to posterior contractions rates (PCRs), for both finite-dimensional (parametric) and infinite-dimensional (nonparametric) Bayesian models. Critical to our approach is the combination of an assumption of local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, here referred to as Wasserstein dynamics, which allows to set forth a connection between the problem of establishing PCRs and some classical problems in mathematical analysis, probability theory and mathematical statistics: the Laplace method for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of the mean Glivenko-Cantelli theorem, and estimates of weighted Poincar\'e-Wirtinger constants. Under dominated Bayesian models, we present two main results: i) a theorem on PCRs for the regular infinite-dimensional exponential family of statistical models; ii) a theorem on PCRs for a general dominated statistical model. Some applications of our results are presented for the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our results lead to optimal PCRs in finite dimension, whereas in infinite dimension it is shown how the prior distribution may affect PCRs. With regards to infinite-dimensional Bayesian models for density estimation, our approach to PCRs is the first to consider strong norm distances on parameter spaces of functions, such as Sobolev-like norms, as most of the approaches in the classical (frequentist) and Bayesian literature deal with spaces of density functions endowed with $\mathrm{L}^p$ norms or the Hellinger distance.  ( 2 min )
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.
    Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation. (arXiv:2203.00953v3 [math.ST] UPDATED)
    Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.
    Beyond the density operator and Tr(\rho A): Exploiting the higher-order statistics of random-coefficient pure states for quantum information processing. (arXiv:2204.10031v1 [quant-ph])
    Two types of states are widely used in quantum mechanics, namely (deterministic-coefficient) pure states and statistical mixtures. A density operator can be associated with each of them. We here address a third type of states, that we previously introduced in a more restricted framework. These states generalize pure ones by replacing each of their deterministic ket coefficients by a random variable. We therefore call them Random-Coefficient Pure States, or RCPS. We analyze their properties and their relationships with both types of usual states. We show that RCPS contain much richer information than the density operator and mean of observables that we associate with them. This occurs because the latter operator only exploits the second-order statistics of the random state coefficients, whereas their higher-order statistics contain additional information. That information can be accessed in practice with the multiple-preparation procedure that we propose for RCPS, by using second-order and higher-order statistics of associated random probabilities of measurement outcomes. Exploiting these higher-order statistics opens the way to a very general approach for performing advanced quantum information processing tasks. We illustrate the relevance of this approach with a generic example, dealing with the estimation of parameters of a quantum process and thus related to quantum process tomography. This parameter estimation is performed in the non-blind (i.e. supervised) or blind (i.e. unsupervised) mode. We show that this problem cannot be solved by using only the density operator \rho of an RCPS and the associated mean value Tr(\rho A) of the operator A that corresponds to the considered physical quantity. We succeed in solving this problem by exploiting a fourth-order statistical parameter of state coefficients, in addition to second-order statistics. Numerical tests validate this result.
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.
    Murmurations of elliptic curves. (arXiv:2204.10140v1 [math.NT])
    We investigate the average value of the $p$th Dirichlet coefficients of elliptic curves for a prime p in a fixed conductor range with given rank. Plotting this average yields a striking oscillating pattern, the details of which vary with the rank. Based on this observation, we perform various data-scientific experiments with the goal of classifying elliptic curves according to their ranks.
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.  ( 2 min )
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Path-Specific Objectives for Safer Agent Incentives. (arXiv:2204.10018v1 [cs.AI])
    We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )

  • Open

    I have a time series of time, temp, humidity, apparent temp, and ac/heater/fan state
    I want to create a NN that takes these readings and makes predictions about "what will the readings be in 5 minutes if I turn the AC on?". I'm thinking of training it with "the angle of the sun at that time", "temp", "humidity", and "ac/heater/fan state"(3) and then extracting data pairs spaced by 5 minutes where the system was in that state for the entire interval. Then I'm thinking I should use the 5-minute-later apparent temp as the training output. So the NN ultimately answers the question "what would be the apparent temperature if the system were to be in the given state for the next 5 minutes?" Am I on the right track here? submitted by /u/HasFiveVowels [link] [comments]  ( 1 min )
  • Open

    An AI painting some colorful pitbulls
    submitted by /u/p0goniphaft111 [link] [comments]
    Building A Pictionary App (sketch recognition model) with Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Is there a AI which I can use to edit images(selfies etc.)?
    Like I mark some areas of my pictures and then selcect what should happend with them? submitted by /u/xXLisa28Xx [link] [comments]
    What AI can I use to make caricatures from pictures from people?
    Is artbreeder the best way to do it, or is there a better way? submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    Learning or working with AI? Come join us, we are a Discord Community with over 20'000 members! Ask questions, find teammates, share your projects, attend events, and much more to come!
    Programming is way more fun when you learn/work with someone. Help each other, ask questions, brainstorm, etc. There is just so much benefit to joining a community when you are in this field, especially when you cannot find the question you are looking for on stack overflow! 😉 This is the same thing with AI, and it is why a little less than two years ago I created a discord server. Where anyone learning or working in the field could come and share their projects, learn together, work together, and much more. The community has now over 20 000 members, which is unbelievable! So glad to see it growing and see everyone so active. We also have an amazing partnership with an AI company coming that is super exciting for the community. You definitely want to be there to enjoy all the benefits they will give us. Come join us if you are in the field of AI ! https://discord.gg/learnaitogether submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Analyse sentiment/tonality in social networks
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 3 min )
  • Open

    [R] GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints
    https://arxiv.org/abs/2204.09123 https://www.researchgate.net/publication/360079336_GAMe_changer_or_not_An_evaluation_of_interpretable_machine_learning_models_based_on_additive_model_constraints submitted by /u/Positive_Ad_1090 [link] [comments]
    How can you differentiate Kornia SIFT descriptor? [P]
    Kornia is a differentiable library for computer vision based on PyTorch. Does anyone have experience with their SIFT descriptor. What can you differentiate? submitted by /u/avd4292 [link] [comments]
    [D] Evaluation and Selecting Models: Base on Loss or Metrics?
    When comes to evaluating and selecting a model, should one focus on minimizing loss (i.e., sparse categorical crossentropy) or obtain high rated metrics (i.e., f1)? Often time the model highest rated metrics would generate higher loss than ones with lower ratings in metrics during validation/test sets. Some would say focus on metrics as loss are for the machine to optimize learning, what stays in the training, stays in the training. However, wouldn't loss be also an important element to consider since it also describe the performance of the model, particularly when obtained from the test set? How should one prioritize? Metrics/loss rules all or seek for balance? submitted by /u/Hydraze [link] [comments]  ( 1 min )
    [R] Optimize clustering for downstream task
    Assume to have a 2-step algorithm: 1) aggregate data points into clusters 2) feed the clusters to a downstream task (e.g. classification, regression, etc). Is there any work that explores how to optimize the clustering in 1) to achieve the best performance in the downstream task 2)? One example would be a differentiable clustering algorithm that receives gradients from the downstream task or a parametrized clustering algorithm whose parameters are automatically tuned to increase the performance of the downstream task. I have found very little on this topic in the literature, could you point me to some relevant work? submitted by /u/fedetask [link] [comments]  ( 1 min )
    [D] What is a good emoji aware pre-trained language model?
    I am classifying social media posts (facebook, instagram), with emojis being upwards of 100% of content. For example, you may want to tag "🤮🤮🤮" as in need for moderation, and "🤔🤔🤔" as prioritized for a response. Looking for a good model to fine tune I found BerTweet, which seems at least somewhat emoji aware. However it also has a ton of out-of-vocabulary results, both for emoji and semi-common English words, despite it's liberal use of emoji.demojize and splitting up more complex emoji: ​ https://preview.redd.it/t6ai3o8le1v81.png?width=687&format=png&auto=webp&s=c16157addbe1b3d34858708f3e6c7517e64d26ec A model like `xlm-roberta-base with a larger vocabulary (250k) and more robust tokenization seems to have some 500 emoji directly in its vocabulary directly, without converting them to text. This seems potentially more promising, but also guarantees a token like 🤮 is just out of vocabulary rather than being interpreted by word pieces. Has anyone here had experience with dealing with emoji in text classification, and what approaches were most successful? submitted by /u/sanderbaduk [link] [comments]  ( 2 min )
    [D] What is the best method to use metric network at finetune after contrastive learning?
    Hi, I have a question about how to use metric network after contrastive learning. If I have trained a network well with NCELoss, I would like to finetune this network to match the best output by input(It used at calculating NCELoss). Is there any good way to do it? ​ Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] Opinions needed - Anyone interested in mock peer review?
    We’d like to know if anyone is interested in participating in a mock peer review? Basically if you have a paper you’d like to get feedback on, and would like to review others’ papers in exchange, you’re welcome to continue reading. We are gauging public interest in mock peer review and exploring the possibility to host the reviews on DouBlind. We’d like to know your answers to the following questions: Are you interested in mock peer review? Do you want to do this privately (paper and review are kept inside a small group) or openly (paper and review are open)? How many papers do you like to review? Do you have any concerns? submitted by /u/DouBlindDotCOM [link] [comments]  ( 2 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    This is reproduced from Zhihu and translated by DeepL, only used for enthusiasts to communicate. ​ MindSpore, as an end-to-edge cloud collaborative full-scenario AI open source framework, takes into account the flexibility of academic research and the high-performance needs of industry, supports end-to-edge cloud full-scene business, and brings developers a simpler programming, easier debugging, superior performance, and more flexible deployment experience, which has received widespread attention and application in the industry and has been open source on 2020.3.28, and is the Gitee The highest index of open source software. Welcome to participate in open source contributions, model crowdsourcing collaboration, industry innovation and application, algorithm innovation, academic collabora…  ( 3 min )
  • Open

    Data Pre-Processing in TF-Agents
    Hi everyone, this is my first post please go easy on me. I'm currently playing around with a bigger Model in tf-agents. I worked only with structured data (TF, SKlearn, Pandas...). Now I'm struggling a bit with the preprocessing and where in the architecture to place it. I use multiple Inputs and encoding layers for each of them. For the training of the Encoders I used some SKLearn pre-processor (StandardScaler, MinMaxScaler, KBinsDiscretizer). I try to reuse the pre-processing pipeline in the model or extract the information for other pre-processing mechanisms(e.g. pre-processing tf layers) My current options I came up with: Incorporate it directly into the Environment and return the pre-processed observation Pro easy, can probably use my SKlearn pipeline Contra I'd like to keep the architecture clean, so the environment should only give out raw values and not prepared values for a certain model Use an environment wrapper around the "raw" environment Pro "raw" environment needs no tuning Contra not sure if I can use my pipeline here and not sure if I'm taking a bad path here Use pre-processing TF layers Pro most of the API is there and can be used in my Encodernetworks, seems to be the TFic way Contra the SKlearn pre-processor have values for each column. The layers (e.g. rescale ) seems to only take one configuration for a whole tensor. I could probably create a layer for each value in the Tensor but that doesn't feel that it is supposed to be like that. If you have more options or can share your experience one of the options mentioned above I would be very glad. submitted by /u/Kjiessar [link] [comments]  ( 1 min )
    Masking in RNN in the actor network
    I am using PPO in the context of multi-agent RL. I was wondering if PyTorch has a way of handling when hidden states should be reinitialized to zeros. What I have found is this implementation: def forward(self, x, hxs, masks): if x.size(0) == hxs.size(0): x, hxs = self.rnn(x.unsqueeze(0), (hxs * masks.repeat(1, self._recurrent_N).unsqueeze(-1)).transpose(0, 1).contiguous()) x = x.squeeze(0) hxs = hxs.transpose(0, 1) else: # x is a (T, N, -1) tensor that has been flatten to (T * N, -1) N = hxs.size(0) T = int(x.size(0) / N) # unflatten x = x.view(T, N, x.size(1)) # Same deal with masks masks = masks.view(T, N) # Let's figure out which steps in the sequence have a zero for any agent # We will always assume t=0 has a zero in it as that makes the logic cleaner has_zeros = ((masks[1:] == 0.0) .any(dim=-1) .nonzero() .squeeze() .cpu()) # +1 to correct the masks[1:] if has_zeros.dim() == 0: # Deal with scalar has_zeros = [has_zeros.item() + 1] else: has_zeros = (has_zeros + 1).numpy().tolist() # add t=0 and t=T to the list has_zeros = [0] + has_zeros + [T] hxs = hxs.transpose(0, 1) outputs = [] for i in range(len(has_zeros) - 1): # We can now process steps that don't have any zeros in masks together! # This is much faster start_idx = has_zeros[i] end_idx = has_zeros[i + 1] temp = (hxs * masks[start_idx].view(1, -1, 1).repeat(self._recurrent_N, 1, 1)).contiguous() rnn_scores, hxs = self.rnn(x[start_idx:end_idx], temp) outputs.append(rnn_scores) # assert len(outputs) == T # x is a (T, N, -1) tensor x = torch.cat(outputs, dim=0) # flatten x = x.reshape(T * N, -1) hxs = hxs.transpose(0, 1) submitted by /u/No_Possibility_7588 [link] [comments]  ( 2 min )
    Does anyone know of a chess environment written in JAX?
    I don't think an opensource one exists but figured I'd ask here because you never know what's laying around the internet! As an aside, if one doesn't exist, let me know if you're interested in partnering in writing one! Edit: For anyone wondering I need the env to be in jax because my muzero implementation is in jax and I need the env to run on TPU cores, not CPU submitted by /u/evanatyourservice [link] [comments]  ( 1 min )
    Papers that use neural networks solely for planning in large MDPS (i.e., no learning)
    I am looking for any papers that do the following: use neural networks in the RL pipeline as the state space is too large for calculating the optimal policy using the traditional tabular value iteration or policy iteration. In this setting, the model is completely known, i.e., no learning. Most papers I see with DeepRL assume that the transition probabilities are unknown and that they have access to a simulator that gives them the ability to query data points. I am looking for existing work in DeepRL where the transition probabilities are known but the problem is intractable using tabular methods. Any direction would be appreciated, thanks! submitted by /u/lolillini [link] [comments]  ( 1 min )
    PPO update without using NNs / batch updates
    Hello, im making a new post as i couldnt find any answers to this before (although this reddit post is similar to my issue) I am trying to implement a simple multivariate Gaussian policy without neural networks, basically using a standard policy gradient update with SGD + score function gradient, without batches. The reason for this is to avoid unstable updates, meaning too large updates in mean/variance. The idea is thus to use a trust region update, to keep the updates within some reasonable size. I am a little confused regarding the maximization of the surrogate objective. As seen in this stackoverflow post, we wish to maximize [pi/pi_old] , compared to [log(pi)] in vanilla PG. Since i do not use automatic differentiation, but one single stochastic descent, how do I find the gradient of pi/pi_old ? To my understanding, the flow of the algorithm is this: sample experience -> compute new policy parameters -> compare with previous policy -> construct surrogate function -> perform SGD on surrogate to get the actual new policy It is the last step i am struggling with. submitted by /u/Acrobatic-Ad-9189 [link] [comments]  ( 1 min )
    Simulating robotic arm for object manipulation
    I'll be starting my work for object manipulation using deep RL, and i would like to get start from the scratch, please recommend the source, tools, and software used for this purpose. but not be working on modeling the robot, instead will be using any robot with gripper which can be interfaced with ROS. Also please link the github repositories which can be helpfull in the learning process Thanks submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    policy-encoding mapping implementation
    Hi, I want to check policy-encoding mapping e : (S → A) → R^k in Universal Successor Features Approximators. I don't know how to embedding network to another network. There are too many weights! Do you have any ideas? Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Useful Tools and Resources for Reinforcement Learning
    Found a useful list of Tools, Frameworks, and Resources for RL/ML. It covers Reinforcement learning, Machine Learning (TensorFlow & PyTorch), Core ML, Deep Learning, Computer Vision (CV). I thought I'd share it for anyone that's interested submitted by /u/Khaotic_Kernel [link] [comments]  ( 1 min )
  • Open

    Pix2Seq: A New Language Interface for Object Detection
    Posted by Ting Chen and David Fleet, Research Scientists, Google Research, Brain Team Object detection is a long-standing computer vision task that attempts to recognize and localize all objects of interest in an image. The complexity arises when trying to identify or localize all object instances while also avoiding duplication. Existing approaches, like Faster R-CNN and DETR, are carefully designed and highly customized in the choice of architecture and loss function. This specialization of existing systems has created two major barriers: (1) it adds complexity in tuning and training the different parts of the system (e.g., region proposal network, graph matching with GIOU loss, etc.), and (2), it can reduce the ability of a model to generalize, necessitating a redesign of the model for…  ( 7 min )
  • Open

    Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances
    AWS CodeArtifact allows developers to connect internal code repositories to upstream code repositories like Pypi, Maven, or NPM. AWS CodeArtifact is a powerful addition to CI/CD workflows on AWS, but it is similarly effective for code-bases hosted on a Jupyter notebook. This is a common development paradigm for Machine Learning developers that build and train […]  ( 9 min )
  • Open

    7 Ways Your Business Can Plan For Artificial Intelligence
    Artificial Intelligence is all over the world today. From the use of virtual assistants like Siri, Alexa, or Cortana, to improving…  ( 2 min )
  • Open

    By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet
    Different parts of the globe are experiencing distinct climate challenges — severe drought, dangerous flooding, reduced biodiversity or dense air pollution. The challenges are so great that no country can solve them on their own. But innovative startups worldwide are lighting the way, demonstrating how these daunting challenges can be better understood and addressed with Read article > The post By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Last Week in AI: Chip Startup Funding Doubled, Google Text+Image Search, Analog AI, Criminal Robotaxi
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 31 - Spaceships Galore Planet VQGAN CLIP
    submitted by /u/LordPewPew777 [link] [comments]
    What would be the best approach to auto-generate comic panels (Garfield style) with drawings and speech bubbles, assuming I have tons of scans to use as training?
    I'm a software developer but I'm not really experienced in AI. Would it be best to train first for speech bubbles and separately for panel drawings? What kind of network is the best for this? Just thinking that it would be a cool project to have auto generated legible infinite comic strips for a semi niche comic strip that runs in my country. submitted by /u/dananite [link] [comments]  ( 1 min )
    Any Recommendations for AI Content Generation Software?
    Content generation is such a time-suck for small businesses, and it seems like an interesting vertical to apply AI. The AI would generate the content after being given a prompt. There are already a few tools trying this, but the quality doesn't seem to be very high. Are there better tools that I'm missing, or is the consumer-facing software so early-stage that it would be better to hire a data scientist and train an AI system specifically for this purpose? https://www.reddit.com/r/MachinesWrite/comments/f45eav/list_of_ai_text_generators/?utm_source=share&utm_medium=web2x&context=3 https://www.reddit.com/r/juststart/comments/axa8w3/ai_ml_text_generators/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Looking for enterprise conversational AI platform
    submitted by /u/sunstormfirefall [link] [comments]  ( 1 min )
    VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    Microsoft AI Researchers Develop ‘Ekya’ To Address The Problem Of Data Drift On The Edge Compute Box And Enables Both Retraining And Inference To Co-Exist On It
    Deep neural network (DNN) models for object recognition and classification, such as Yolo, ResNet, and EfficientNet, are used in video analytics applications such as urban mobility and smart automobiles. There is a symbiotic link between edge computing and video analytics, claiming that live video analytics is the “killer app” for edge computing. Edge devices come in various sizes and designs, but they are always resource-constrained compared to the cloud. Video analytics deployments send the videos to on-premises edge servers. The article handles the difficulty of supporting inference and retraining jobs on edge servers simultaneously, which necessitates navigating the fundamental tradeoff between the accuracy of the retrained model and the accuracy of the inference. Edge computation is preferred for video analytics because it eliminates the need for expensive network lines to broadcast videos to the cloud while simultaneously preserving video privacy. Edge computation has a finite amount of resources (e.g., with weak GPUs). The mismatch between the increasing rate of model compute needs, and the total cycles of processors exacerbate this problem. As a result, model compression is used in edge deployments. Continue reading our bite on this research Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/07/nsdi22spring-final74.pdf Github: https://github.com/edge-video-services/ekya submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there a AI which is able to turn normal videos into sketches like the video below?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Automatic Summaries of your Documents in Google Docs !
    submitted by /u/OnlyProggingForFun [link] [comments]
    How to achieve a training duration on MindSpore that's less than or equal to that on TensorFlow?
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 1 min )
    CypherZilla - The First Encoded NFT Made By AI To Support Trump. Upvote If You Want To Have A Huge Impact!
    CypherZilla on OpenSea https://reddit.com/link/u8he59/video/i9a29i5ywtu81/player submitted by /u/thecypherbeast [link] [comments]
    What price we have to pay for the progress in AI, have a look-
    https://www.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Collaboration AI video and music
    submitted by /u/Recent_Coffee_2551 [link] [comments]
  • Open

    Question about trained models
    Hello I have a question. For example, in the case of an inverted pendulum or cartpole, I train the model for the pole to be at 0 degrees (vertical) and it works. Then I want this same model to keep the pole at another position, for example, 3 degrees, do I have to train this model again for achieving this to or can I somehow use the model I already trained and what it learnt and input the new position I want it to be? idk if I explained myself I guess its mostly doubts about how to interact with the model and how to properly use a model that has already been trained. If anyone has some example of code (python, gym), on interacting with a trained model it would be really helpful. submitted by /u/Sleyck [link] [comments]  ( 1 min )
    Why is this implementation of PPO using a replay buffer?
    https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/r_mappo.py submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    What is the role of masks in the computation of GAE?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Question About Optimal Policy Guarantees in POMDPs
    I'm working on a project where I'm trying to prove the existence of a particular set of functions by showing it can be constructed as the solution to a Markov Decision Process. However, it seems that it's much simpler to convert it to a partially observable MDP, rather than a classic one. I know it's been proven that the set of optimal policies for a classic MDP is nonempty, and intuitively I feel like the same should hold for POMDPs, but I'm having a hard time finding a particular source proving such a thing. Does anyone know where I ought to look? submitted by /u/LessPoliticalAccount [link] [comments]  ( 1 min )
    Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF)
    submitted by /u/JBaloney [link] [comments]  ( 1 min )
    What is this line in the Sutton/Barto textbook referring to?
    In the first edition of the textbook, the section on actor-critic methods (link) describes the classical approach of using the temporal difference error 𝛿 to modify the probability of selecting action a in state s: https://preview.redd.it/fa35vut7ewu81.png?width=238&format=png&auto=webp&s=c1b8952b065a90ecd2b8c0c30b985e36d37dbc30 Then they briefly mention that one variation on the classical approach is to scale temporal difference error 𝛿 by the inverse of the probability of selecting the action a, where that probability is given by 𝜋(s, a): https://preview.redd.it/69gd3zmbdwu81.png?width=375&format=png&auto=webp&s=b66a7d5eef3c2b7bc256473aed728223921a751c They say: " These issues were explored early on, primarily for the immediate reward case (Sutton, 1984; Williams, 1992) and have not been brought fully up to date." This idea is relevant to a project I'm working on, and I'd like to read more about it. But the references seem to be dead ends: Sutton 1984 is his PhD thesis, which I can't find a digital copy of, and Williams 1992 is this paper, which doesn't seem to contain this idea. Also this section doesn't seem to appear in the second edition of the textbook. You folks are much smarter than I am: Does modifying the update in this way mean anything to you? Are there modern approaches that do something like this? Or should I assume it was a little-explored idea in the early days that has been more-or-less forgotten? Thanks very much! submitted by /u/Careless-Argument-37 [link] [comments]  ( 2 min )
    Reinforcement Learning with delays
    I was wondering what methods there are for RL with time delay other than augmenting the state space with the action buffer or using a model to undelay the environment. I've seen this post How to deal with the time delay in reinforcement learning? - Artificial Intelligence Stack Exchange however it's rather brief and I wondered if there were any more recent advancements. I am also struggling to understand partial trajectory resampling ( 2010.02966.pdf (arxiv.org) ) and the code in the accompanying repo. GitHub - rmst/rlrd: PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020) I was wondering how we can resample actions in environments with constant delays if those actions are used in the state space for all subsequent chosen actions? submitted by /u/SuperDuperDooken [link] [comments]  ( 1 min )
    Is it stupid to use rl to control solar panel angle?
    submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    How can I use the environment in Emergence of Locomotion Behaviours in Rich Environments?
    Hi, I want to train my agent in the environment used in "Emergence of Locomotion Behaviours in Rich Environments". Here is a video about that https://www.youtube.com/watch?v=hx_bgoTF7bs. Is the environment released? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [P] mGPT model released: a multilingual gpt-3-like model for 61 language
    Hi everyone. Today we released the mGPT model: multilingual generative pre-trained transformer The checkpoints are available on Huggingface model page The example usage is at the Github repo https://github.com/ai-forever/mgpt The model has 1.3 billion parameters The context length is 512 tokens. The model can generate sequences after the input prompt, can be used for fine-tuning or for zero- and few-shot learning: from transformers import GPT2LMHeadModel, GPT2Tokenizer model_name = "sberbank-ai/mGPT" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) model.cuda() model.eval() texts = [ "My favourite holiday is ", "Իմ սիրելի տոնն է ", "Моє улюблене свято ", "mi fiesta favorita es ", "मेरी पसंदीदा छुट्टी है", "我最喜欢的节日是", "Min…  ( 2 min )
    [P] Deep Learning GPU Benchmark: A Latency-Based Approach
    Hi r/MachineLearning! I want to share with you a fun side project of mine on benchmarking the GPUs for deep learning: [project page]. https://preview.redd.it/7olwqyze5yu81.png?width=2041&format=png&auto=webp&s=25aecb9733366720a2be5cecc2048eb2a734c9b9 Here are some key features: It helps to estimate the runtime of algorithms on a different GPU. It measures GPU processing speed independent of GPU memory capacity. It contains adjustable weightings through interactive UIs. The project page also explains how this benchmark differs from existing ones, and why this benchmark is more relevant to academic research. I would love to know what you think! submitted by /u/roll-a-dice [link] [comments]  ( 1 min )
    [D] What's your perfect laptop for deep learning research?
    I'm using mbp 2015, it's a pretty solid laptop, I like it a lot, though it feels slow and I've started to look for a replacement. Given that I run all experiment on gpu dedicated servers, my laptop serves me as a typewriter, it's ok, but I'd like to get more out of it. Frankly I'm a bit disappointed by 2021 Macbooks, hope they'll be improved in 2022. Recently lambda labs together with razer announced their tensorbook https://lambdalabs.com/deep-learning/laptops/tensorbook , their pricing looks weird to me, the more you pay the more years of support you have, that's the only thing which differentiates base bundle from enterprise. Also there is no option to customize hardware for it, though basic bundle itself looks ok, its price is $3500 like M1 Max's. What's your opinion about this laptop in particular? would you buy it? generally this laptop looks like a cool thing to have for local model development even from a tent somewhere in Nepal, given that you have enough power banks to charge it. :) What's your choice of a laptop for DL? My biggest requirement is a durable laptop which will serve at least 5 years, better with NVIDA GPU for development and debugging. submitted by /u/taras-sereda [link] [comments]  ( 1 min )
    [R] Deep models of superficial face judgments (PNAS)
    ​ Transformations that alter the perception of target faces Paper: https://www.pnas.org/doi/10.1073/pnas.2115228119 Dataset: https://onemillionimpressions.com/ submitted by /u/joshuacpeterson [link] [comments]
    [R] Planting Undetectable Backdoors in Machine Learning Models
    submitted by /u/Wiskkey [link] [comments]
    [P] VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    [P] Announcing cleanlab 2.0: Automatically Find Errors in ML Datasets
    Hi folks. This morning I released the new cleanlab 2.0 Python package for automatically finding errors in datasets and machine learning/analytics with real-world, messy data and labels. tl;dr - cleanlab provides a framework to streamline data-centric AI. https://preview.redd.it/hq1kyasvwwu81.png?width=2279&format=png&auto=webp&s=4fa3c82ec66d685c8fc4f95c5d9a0fc4be192d6b After 1.0 launch last year, engineers used cleanlab at Google to clean and train robust models on speech data), at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Joined by two good friends from grad school, we completely rebuilt cleanlab 2.0 to work for all data scientists, ML datasets, and models; and hit a…  ( 2 min )
    [P] Galp Hackathon - Win 10.000€ from home!
    If you are passionate about Data & AI we have the perfect challenge for you! The applications for Galp’s Hackathon Retail 4.0 are OPEN! With this Hackathon, Galp is challenging the community to propose solutions to specific problems and use cases that they think could improve their typical customer journey in the service stations. Gather a team and come up with an innovative solution for a chance of winning 10.000€! Let’s shape the future of Galp's retail? Apply now: https://taikai.network/en/galp/hackathons/retail40 https://preview.redd.it/wkfb6ybuwwu81.png?width=3334&format=png&auto=webp&s=deef13767df5ba607e387ce4e278ae3981d93582 submitted by /u/migueldsalmeida [link] [comments]  ( 1 min )
    [D] Imbalanced multi class classification 📌
    I'm working on a Machine Learning problem for multi class classification with imbalanced classes distribution, so obviously my model favours classes with more data and fails to predict classes with few data, what are the techniques I can use to help the model distinguish all the classes the same way ? P.S I'm avoiding to use SMOTE method to train the model on real used data rather than generated submitted by /u/According-Promise-23 [link] [comments]  ( 2 min )
    [R] CVPR 2022 - Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
    submitted by /u/SleekEagle [link] [comments]
    [R] My continuously updated machine learning research notes
    Dear ML researchers, For the past many years, I've been updating my machine learning research notes for my PhD students and everyone online continuously. I don't like uploading to arxiv to get "citations", and GitHub serves me well: Hope they are useful for you: https://github.com/roboticcam/machine-learning-notes Richard, submitted by /u/MLknowledge [link] [comments]  ( 1 min )
    [D] Correcting for imbalance in regression datasets
    Hi, I am performing a Image --> scalar regression. The output scalar I am trying to estimate follows a roughly Gaussian distribution. I notice that the DNN output is biased to output values towards the mean (makes sense). ​ This seems like a problem of imbalanced data. For classification, I can oversample minority classes. What is the equivalent for regression? Is there an equivalent technique for regression where we oversample "outliers" and undersample central values. submitted by /u/rsandler [link] [comments]  ( 1 min )
    Building Dense Passage Retrievers [P]
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] How do you usually run sanity checks when training GANs ?
    Hi, I have been studying super-resolution with gans and took a look at SRGAN et ESRGAN. I have spent the whole day running experiments in order to find if I can manage to overfit on a single batch of 16 / 32 / 128 examples (MNIST). I have found out that it's almost impossible to use this tactic as a sanity check because it simply cannot generate good quality samples. I would like to know what are your thoughts on this, and how you would run sanity checks regarding GANs. ​ Thank you ! submitted by /u/Frizzoux [link] [comments]  ( 1 min )
    [D] Amazon Releases a New Multilingual Dataset for NLU
    https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding submitted by /u/__lawless [link] [comments]
    [D] How to handle features that apply to a whole csv-file vs single rows?
    Hi all, I have csv-files (~300) with a fixed set of columns (~40) but varying number of rows (sum of all rows ~300 000) and multiple labels per csv that I want to predict. Because of the limited number of csv-files and as a first try I am predicting the labels row-wise (attaching the label to all the rows of one csv-file) which works well for some labels but not for others. Currently, I am calculating some features for every row and just appending them to the row and some features for the whole csv-file and appending them to every row. Two problems are now arising that I would like to hear some input about: The number of features per csv is growing and it seems like a waste to copy them to every row. For some labels it is probably reasonable to throw away most of the rows and only feed in a handful. How would you design a structure that incorporates the limited number of csv-files and the different ways to treat features (row vs. csv)? submitted by /u/tlklk [link] [comments]  ( 2 min )
    [R][P] Differences in publishing a paper at a conference and in a journal?
    Hi! I am an undergrad and I am going to start my MS in CS this fall. My research interest is mainly in Multimodal Learning for language and Speech. I have written papers before but both my papers have been peer reviewed journal papers (Knowledge-Based Systems, Elsevier) [1] [2] I now want to start publishing papers in conferences since I have noticed that it is much easier to get noticed and recieve reviews when the paper is presented at a conference. I want to understand how different is the publication process for conferences? I also wanted recommendations on conferences in the NLP and Speech area, considering this will be my first conference paper. Thanks! (I would also appreciate reviews on my papers if anyone has the time to look them over. Thanks!) submitted by /u/prabhav55221 [link] [comments]  ( 4 min )
    [D] How do you get the maximum of arxiv sanity?
    Basically, I don't want to phrase this as a a "how-to" post but arxiv-sanity-lite really bothers me. How do you guys find recent papers in your area of interest which are promising besides following what is published at major conferences? I believe the website is "too lightweight". For example, what if I am interested in computer vision papers and I specify that in the tags field (i.e. explicitly typing "computer vision"). How can I list the papers based on a score (basically goodness of the paper)? Why does using shortcuts (basically links) like `````recommend over last week or recommend over last 3 days always (at least for me) end up with 0 results? I've never used the original arxiv-sanity before so I strongly believe that there is something that I am missing. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [N] New opportunity: PhD Candidate within multisensor data fusion and applied machine learning for analysis of Arctic sea ice
    The Norwegian University of Science and Technology (NTNU) has a vacancy for PhD Candidate within the DIGITALSEAICE project . The project aims to build a multi-scale digital infrastructure that integrates local and regional sea ice models for improved forecasting and understanding of variations in polar ice conditions. More information here: https://www.jobbnorge.no/en/available-jobs/job/224802/ submitted by /u/KatjaKim [link] [comments]  ( 1 min )
    [P] Efficient Deep Learning Book
    We are working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. The goal is to introduce these ideas in a single place, without having to parse many papers, try to get a working code sample, and then spend time debugging. With the accompanying codelabs, we hope that our readers can make their models 4-20x smaller, faster, and better in quality. We have released the first four chapter's draft PDFs, and would truly appreciate any sort of comments / feedback. Book: efficientdlbook.com Feedback: hello@efficientdlbook.com submitted by /u/EfficientDLBook [link] [comments]  ( 1 min )
    [D] Interview w/ Google Brain researchers on Sparse Expert Models (Switch Transformers, GLAM, and more...)
    https://youtu.be/ccBMRryxGog This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models. Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models. ​ OUTLINE: 0:00 - Intro 0:30 - What are sparse expert models? 4:25 - Start of Interview 5:55 - What do you mean by sparse experts? 8:10 - How does routing work in these models? 12:10 - What is the history of sparse experts? 14:45 - What does an individual expert learn? 19:25 - When are these models appropriate? 22:30 - How comparable are sparse to dense models? 26:30 - How does the pathways system connect to this? 28:45 - What improvements did GLAM make? 31:30 - The "designing sparse experts" paper 37:45 - Can experts be frozen during training? 41:20 - Can the routing function be improved? 47:15 - Can experts be distributed beyond data centers? 50:20 - Are there sparse experts for other domains than NLP? 52:15 - Are sparse and dense models in competition? 53:35 - Where do we go from here? 56:30 - How can people get started with this? ​ Papers: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905) Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906) submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [P] Interactive semantic map of ICLR 2022
    Next week ICLR 2022 is taking place. Fully virtual and 1000+ high quality papers. To make sense of this volume of papers we have indexed the papers and provide an interactive semantic map of #ICLR2022, check out: https://search.zeta-alpha.com/?q=&d=ly&doc_sources=ICLR&sort_by=authority To enjoy the full map, click on [Explore more] and then enter full screen mode. We will also discuss the program and 10 must read papers in the Zeta Alpha "Trends in AI" ICLR edition webinar Monday 25th, for which you can sign up here. https://us06web.zoom.us/webinar/register/7816505274568/WN_82DzwhXZQbOCSTWgaI9xMw Looking forward to meet you online at ICLR 2022! https://preview.redd.it/6wdqj4ru7uu81.jpg?width=2202&format=pjpg&auto=webp&s=c97417c9ea39919041949bf3aa38ad33bb6eca5a submitted by /u/EngineerZetaAlpha [link] [comments]  ( 1 min )
    [R] MindSpore Paper Interpretation: MIEHDR CNN: Main Image Enhancement based Ghost-Free High Dynamic Range Imaging using Dual-Lens Systems
    This article is reproduced from Zhihu and translated by DeepL for enthusiasts to communicate. 1. Research Background High dynamic range images (HDR) are mainly oriented to picture display technology. In a certain scene, if the range of high and low luminance areas exceeds the maximum luminance range of the image, the display effect will be greatly reduced, and HDR is to better solve this problem, it can record a broader range of luminance images, so as to obtain a more effective display effect. The current solution to the problem of generating high dynamic range images (HDR) focuses on the fusion of two low dynamic range (LDR) images of different exposures taken with the same camera. In such a solution by the camera shake or object movement during the exposure time to produce the proble…  ( 3 min )
    [D] Most efficient way to use large image datasets with clusters for ML?
    I am having trouble finding some general information on this subject. I know I am down the rabbit hole when google doesn't have an answer. I want to know best practices and information on using clusters for machine learning with large amounts of data. I believe I have a close to an optimal solution but wanted to get some other opinions on the subject. My current setup: AWS EKS Kubernetes for a cluster Kubeflow for ML platform Katib for HPT jobs Pytorch for custom models Spot instance GPUs Lustre for file serving to the models My Data: Millions of Images stored in S3 ~50TB of data What is the most efficient way to move my data to the cluster? My current approach: Preprocess the data with a dedicated instance and store it in S3 Master runs on a dedicated node Katib spins up a set number of GPU spot nodes A claim is made, and an FSx Lustre system is generated for the pod Advantages: Very fast training and data movement with spot training Disadvantages: I have to spin up several Lustre systems for the training Preprocess the data with a dedicated instance and store it in S3 Possible alternative Same as above but use EFS as a distributed file system so I don't have to wait for Lustre Advantages: Potentially cheaper as I have only one FS Disadvantages: Slow throughput, read this was a bad idea Master runs on a dedicated node Other alternatives UseKatib spins up a PyTorch streaming function with S3(boosted transfer speed)set number of GPU spot nodes Every pod starts a claim is made and downloads data to an EBS Give up and switch to SageMakerFSx Lustre system is generated for the pod Anyone with experience in these technologies I would really appreciate hearing your thoughts. submitted by /u/thewineiswater [link] [comments]  ( 1 min )
    [D] [P] Neural network: same prediction for different inputs
    I am getting the same prediction for different inputs. I am trying to use a regressional neural network. Since data is huge, I am training one example at a time. Here is a simplified version of my code. model = Sequential() model.add(Dense(10000, input_dim=212207, kernel_initializer='normal', activation='relu')) model.add(Dense(100, activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(10000000): #X is input with 212207 values #Y is a output value if i<6000000: model.fit(X.transpose(), Y, epochs=30, batch_size=1, verbose=0) else: prediction=model.predict(X.transpose()) I made sure that I am training on different examples and trying predictions on different examples. I am still getting the same prediction value for all testing inputs. I think I made some mistake in defining the model for regression neural network. Can you please check if the code is correct? submitted by /u/exoplanet_hunter [link] [comments]  ( 1 min )
  • Open

    Fixed points of bilinear transformations
    Introduction I was puzzled the first time I saw bilinear transformations, also known as Möbius transformations. I was in a class where everything had been abstract and general, and suddenly thing got very concrete and specific. I wondered why we had changed gears, and I wondered how there could be much to say about something […] Fixed points of bilinear transformations first appeared on John D. Cook.  ( 2 min )
    Partitioning complexity
    This post looks at how to partition complexity between definitions and theorems, and why it’s useful to be able to partition things more than one way. Quadratic equations Imagine the following dialog in an algebra class. “Quadratic equations always have two roots.” “But what about (x – 5)² = 0. That just has one root, […] Partitioning complexity first appeared on John D. Cook.  ( 4 min )
  • Open

    Hidden Interfaces for Ambient Computing
    Posted by Alex Olwal, Research Scientist, Google Augmented Reality and Artem Dementyev, Hardware Engineer, Google Research As consumer electronics and internet-connected appliances are becoming more common, homes are beginning to embrace various types of connected devices that offer functionality like music control, voice assistance, and home automation. A graceful integration of devices requires adaptation to existing aesthetics and user styles rather than simply adding screens, which can easily disrupt a visual space, especially when they become monolithic surfaces or black screens when powered down or not actively used. Thus there is an increasing desire to create connected ambient computing devices and appliances that can preserve the aesthetics of everyday materials, while providing …  ( 7 min )
  • Open

    Specify and extract information from documents using the new Queries feature in Amazon Textract
    Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract now offers the flexibility to specify the data you need to extract from documents using the new Queries feature within the Analyze Document API. You don’t need to know the structure of the […]  ( 11 min )
  • Open

    Understanding the Difference between Loss Functions and Metrics in Machine Learning/Deep Learning
    Yes! You read the heading right. There’s indeed a difference between loss functions and Metrics in the field of Machine Learning. However…  ( 2 min )
  • Open

    A new state of the art for unsupervised vision
    MIT CSAIL scientists created an algorithm to solve one of the hardest tasks in computer vision: assigning a label to every pixel in the world, without human supervision.  ( 7 min )
    Anticipating others’ behavior on the road
    A new machine-learning system may someday help driverless cars predict the next moves of nearby drivers, cyclists, and pedestrians in real-time.  ( 7 min )
  • Open

    Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors
    Your next trip to the dentist might offer a taste of AI. Pearl, a West Hollywood startup, provides AI for dental images to assist in diagnosis. It landed FDA clearance last month, the first to get such a go-ahead for dentistry AI. The approval paves the way for its use in clinics across the United Read article > The post Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors appeared first on NVIDIA Blog.  ( 4 min )
    GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW
    The gods must be smiling this GFN Thursday — God of War today joins the GeForce NOW library. Sony Interactive Entertainment and Santa Monica Studios’ masterpiece is available to stream from GeForce NOW servers, across nearly all devices and at up to 1440p and 120 frames per second for RTX 3080 members. Get ready to Read article > The post GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Building Dense Passage Retrievers
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    NN from Scratch: #4 Backward Propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Searching for volunteers for ML-based Ukrainian volunteer project.
    We are searching for trustworthy volunteers with some free time who would like to contribute to a digital Ukrainian volunteer project. Our system heavily relies on an image recognition system with a number of specialized filters involvg facial recognition, object recognition, logo detection, photoshop detection etc. People with professional experience with any of these things is preferred, but novice ML people are welcome to join us in a different capacity. DM to learn more about the project, glad to discuss the details with you. submitted by /u/eelgirl [link] [comments]  ( 1 min )
  • Open

    18 Differences Between Good and Great Data Scientists
    If you are employed as a data scientist and have survived (or strived!) in your position for more than a year, chances are you are at least a good data scientist. This is particularly true if you were promoted. The difference between a mediocre and a good data scientist will be the topic of a… Read More »18 Differences Between Good and Great Data Scientists The post 18 Differences Between Good and Great Data Scientists appeared first on Data Science Central.  ( 6 min )

  • Open

    Is there any difference between how DDPG and PPO use the replay buffer?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Any tips for a prospective graduate student in Reinforcement Learning?
    Hello Everyone, I apologize ahead of time if posts like this aren't looked well upon on this sub, but I couldn't find rules against this and I also think this is the best, most niche sub for my question. I also made a new account just to be safe haha. ​ Anyways, I will be graduating this spring with a BS in Computer Science and a BA in Mathematics. I have been researching Machine Learning since my sophomore year (adversarial machine learning) under a professor at my university and recently took upon a second, concurrent research position in RL since last summer. ​ My goal is to get into a PhD program at a higher level than my current university (my current university is good, but doesn't really have much of an AI focus as I've already taken all the AI grad courses as an undergrad). I'm…  ( 3 min )
    Task Allocation problem with graph representation
    Hey everyone, I've recently started working on a task allocation problem using RL. I'd just like to make sure my thinking is correct on how to best approach the problem. At the moment, we have (effectively) a graph traversal sim for n number of agents, where the goal is to minimize the total distance over an episode, as determined by setting the correct tasks. The task supplied to each agent will determine the route that is taken, and therefore the distance. The current idea is to supply an input graph that also contains information on the current location of the agents. A second input would be the set of available tasks. The expected output would be done through a pointer network, where we produce a reordered set of the tasks in descending order of optimality. When step is called, the sim runs until a new task is needed (agent completes it's route). ​ In general, does anyone know a good way to represent the inputs and output of this problem? A pointer network seems like it could work to produce actions, but if I need to do a forward pass for every agent, it seems that there would be no consideration of other agents when determining tasks (We shouldn't have 2 agents doing the same task). For the graph representation, a graph nn seems like an obvious choice, but I just wanted to see if anyone had any insight on why they may or may not be used. submitted by /u/asdfsflhasdfa [link] [comments]  ( 1 min )
    Universities working on reinforcement learning for robotics.
    Can you name any good universities (with high acceptance rate) which are working on reinforcement learning for robotics and also accept students from other branches (i.e. Electrical, Mechanical Engineering). submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Is the game of chess a finite MDP?
    In the standard intro to RL book, I have read that any MDP that has finite actions and states is a finite MDP. But that limit is subjective. So there are approximately 1045. If I limit myself to 105 states, can I say that chess isn't a finite MDP? submitted by /u/BraveProfessional656 [link] [comments]  ( 1 min )
    Reinforcement learning over traditional machine learning method in Finance/Banking ?
    I am currently studying use cases of RL in finance/banking/insurance and I am keen to understand what are its advantages and disadvantages than traditional methods. submitted by /u/kachua26 [link] [comments]  ( 1 min )
  • Open

    FormNet: Beyond Sequential Modeling for Form-Based Document Understanding
    Posted by Chen-Yu Lee and Chun-Liang Li, Research Scientists, Google Research, Cloud AI Team Form-based document understanding is a growing research topic because of its practical potential for automatically converting unstructured text data into structured information to gain insight about a document’s contents. Recent sequence modeling, which is a self-attention mechanism that directly models relationships between all words in a selection of text, has demonstrated state-of-the-art performance on natural language tasks. A natural approach to handle form document understanding tasks is to first serialize the form documents (usually in a left-to-right, top-to-bottom fashion) and then apply state-of-the-art sequence models to them. However, form documents often have more complex layouts …  ( 8 min )
  • Open

    [D] Who are using physics informed neural networks (PINN) in the industry?
    I stumbled upon this JD from Hitachi Energy, which mentions PINN in the section of preferred background: https://www.linkedin.com/jobs/view/2923292435/ Is PINN gaining more attention? And are there more players? submitted by /u/Kohomologia [link] [comments]  ( 1 min )
    [D] Is quantum AI a real thing? (from the software perspective)
    Hi all I'm keeping an eye on state of the art in quantum hardware, but what about software? I can think of many questions and maybe some of you are in the field. What should be the impact of quantum on ML/DL, realistically? What might be a roadmap for the software? And would quantum simulators do already have some benefits on AI? What are the best projects out there? I've seen many but haven't been very convinced submitted by /u/IntelligentHat1657 [link] [comments]  ( 2 min )
    [D] A more fair AI freelancer marketplace that cares freelancers' career advance and benefits
    Hi, ML freelancers. I'm starting a freelancing marketplace, tailored only for AI talents, and I especially care about the welfares of freelancers, and plan to add these: (1) you will be more treated as the employees of the platform, thus we provide training(for all), potentially health care plan(for people have stably worked >20 hours a week), and career advance plan, mentors from experienced freelancers where you get to learn (2) open discussion between employers and you so that you can scope the project better, set a reasonable rate, and timeline (3) we potentially provide MLOPs tool to improve your productivity. (4) we avoid global competition by matching business only with local region-freelancer or areas that are more expensive. How attractive do you think this will be? And any of these benefits already been provided by upwork, freelancer, toptal, fierr? submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [D] Diffusion models video tutorial
    Diffusion models have been behind a recent string of impressive generative results, including OpenAI's DALL-E 2. They’re powered by a simple yet expressive core mechanism. New video covering how they work: https://youtu.be/fbLgFrlTnGU submitted by /u/ariseff [link] [comments]
    [D] Building the Model Behind DoorDash’s Expansive Merchant Selection
    Interested in how DoorDash maintains a well performing and diverse selection in the numerous markets they operate in despite entering the delivery market relatively late ? I had the opportunity to collaborate in this project which involved building a number of models that measured customer preferences, identified market cuisine categories, and predicted merchants' performance on the platform. I wanted to share the approach and some of the technical details with the ML community to get feedback on what we can improve and to show this cool use case to others working on similar sales enablement based models. Check out the blog post I wrote and let me know what you think of our approach. Building the Model Behind DoorDash’s Expansive Merchant Selection submitted by /u/EfficientString7431 [link] [comments]  ( 1 min )
    [D] What's hot in deep learning research at the moment ?
    I took a break from deep learning( starting from last October) , now i want to get back, start with a new project and read papers . Where should i focus ? Should i keep working on vision transformers or maybe start something on geometric deep learning . What's hot and what's going on ? submitted by /u/ovotheking [link] [comments]  ( 1 min )
    [P] A simple PyTorch YOLOv1 training pipeline GitHub Repo
    https://github.com/sovit-123/yolov1_pytorch_voc07 ​ Also, I write about Deep Learning and Machine Learning on https://debuggercafe.com/ Please check it out and let me know if somebody wants any blog posts on a specific topic. submitted by /u/sovit-123 [link] [comments]
    [P] Programmatic: Powerful Weak Labeling
    Hi all!, Really excited to share a project we've been working on and get your feedback! We've made: Programmatic — an NLP annotation tool for building large labeled datasets for NLP without manual annotation Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to …  ( 2 min )
    [D] What's your opinion on project promoting posts in this sub? Your vote matters.
    There are many projects promoting in this sub, you may like or dislike. And if any of my posts you dislike, allow me to apologize first. However, it gets me to think. Several years ago I'm a moderator in a quite large forum, because I don't have enough time to fulfill my responsibilities, then I decided to retire (yes, they can, and I remained as the vip user which only retired moderators can be). This is a large community, a machine learning community. Besides continuously removing some of these posts, and no clear rules on it, can we do any better? We got all the data, and we just cannot train the model? Here are my three proposal, and please give some excellent ideas besides my poor ones: Self promoting post should have values other than itself, and not having annoying contents Self promoting project can be used as a tool in a non self promoting posts, as long as the posts creates valuable contents and the promoting is not obvious and annoying. Depends on the number of new project posts, Weekly/Daily project post can be created by moderator and pinned to the top. All the promoting content goes into the comment. We can explore and upvotes. Here are some illustrations: 1. Direct Promoting Post ​ 2. Indirect Promoting Post ​ Weekly/Daily Promoting Post by Moderator, Pinned to Top, Comments by project owner, upvotes/downvotes by us Which do you think is acceptable? Or you have better ideas? Leave a comment. It's a machine learning sub, don't make machine to solve it better than us. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 3 min )
    [R] Differentiable signal processing for optical communication with Google JAX
    Hey folks, I wrote a mini project based on JAX for optical communications signal processing. https://github.com/remifan/commplax I have a research article as a use case demo, https://remifan.github.io/gdbp_study/article.html This tool essentially implements adaptive DSP equalizers as stateful NN layers (thanks to Jax's explicit stateful syntax) implements compositor interfaces from scratch to wrap up those stateful layers with other regular NN layers so that they can be trained together Homebrew serial compositions of stateful layers It is a fun project for me and I feel JAX really elegantly fits this research use. What do you think about JAX? I appreciate your comments:) submitted by /u/StreetPrice1909 [link] [comments]  ( 1 min )
    [D] Tracking the hardware usage while running CV NN Model on a 1000 Images
    Hi guys, I've been working on a machine learning project and I wanted to see how hardware resources are being used when I run inference on let's say 1000 images. How could i calculate the CPU(running inference on CPU)/RAM workload in that timeframe? I'm running it on a Linux Ubuntu VM. Thanks in advance! submitted by /u/Fifi0912 [link] [comments]  ( 1 min )
    [D] Running interactive Python notebooks on HuggingFace Spaces
    I'm working on a framework Mercury for converting Python notebooks into interactive web apps. It can add widgets to the notebook based on the YAML configuration. End-user can tweak widgets values and execute the notebook. The resulting notebook can be downloaded as single-file HTML. Simple. The framework is built on Django+React. It is easy to deploy to Heroku or other cloud services. Recently, I made it possible to deploy it to Hugging Face Spaces (faster and larger machines than on free tier Heroku). The process of deployment is simple. You need to create a Gradio app on Spaces (my framework is not supported, yet ;) ). You need to add the app.py file that will run the Mercury server and upload the notebook. You can check the details in the docs. The HF Space with example notebook https://huggingface.co/spaces/pplonski/deploy-mercury submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Conditional GAN with multiple adversarial losses - Implementation?
    I would like to test the architecture from the following paper with a different dataset: https://www.mdpi.com/2072-4292/13/19/3834 The authors state that their objective function is the following: https://preview.redd.it/u78f27jb6nu81.png?width=1027&format=png&auto=webp&s=32790a67ec829a1e79b252edd0714b8b3b5a7f4e Where: -x is the real grayscale image. -s is its downsampled version, which should be used both as the initial imput of the generator performing the super-resolution and as a first conditional variable in the learning process. -e is another two-dimensional array containing values for a second additional conditional variable. The authors, however, state that this should be implemented by using two separate conditional adversarial losses, one for each of the conditional variables. To clarify, the first adversarial loss should be: AdvLoss1(ParametersG, ParametersD) = - Log(Discriminator(x,s) - Log(1-Discriminator(Generator(s),s) While the second would be: AdvLoss2(ParametersG, ParametersD) = - Log(Discriminator(x,e) - Log(1-Discriminator(Generator(s),e) Which should be then summed up for the backward pass. In my pytorch implementation, however, I have only been able to set up a unique adversarial loss, which could be defined as: CurrentAdvLoss(ParametersG, ParametersD) = - Log(Discriminator(x,(s,e)) - Log(1-Discriminator(Generator(s),(s,e)) I have tried to implement implemented as follows:(simplified version) which I calculate in the following training loop (simplified version, from the same question asked in the Pytorch forum) as errD and errG after conditioning the network on both s and e at the same time: https://discuss.pytorch.org/t/conditional-gan-with-multiple-adversarial-losses/149627 My question is, is there a way to modify the following loop to obtain outputs that have been separately conditioned only first on s and then on e and thus calculate the two separate adversarial losses originally proposed by the authors instead? submitted by /u/Franken91 [link] [comments]  ( 2 min )
    [D] IJCAI 2022 Paper Notification
    This is the discussion for accepted/rejected papers in IJCAI 2022. Results are supposed to release today. submitted by /u/errohan400 [link] [comments]  ( 1 min )
    [R] Authors Claim to Have "Solved" MNIST and CIFAR
    Paper: https://arxiv.org/abs/2204.07953v1 Code: https://github.com/decurtoydiaz/learning_with_signatures Tangential resources of interest: https://arxiv.org/abs/1905.08494, https://en.wikipedia.org/wiki/Rough_path#Signature, and https://labelerrors.com/ Personally, I believe from their code on Github, they have a possible data leakage (in the same vein of the current issue raised there) as well as an accuracy of 100% on a test set is fishier than a fish market. However, I am very curious to hear from the court of public opinion. How is everyone feeling about this? submitted by /u/blingblingbeepbeep [link] [comments]  ( 4 min )
    [D] How do I evaluate if my data represent the target variable before training a machine learning algorithm?
    I have a dataset of points cloud where each point in the point cloud has a variable. I am trying to relate the local geometry features to that point variable by using FPFH, This means I am generating my own features from the dataset by first using an area of n-points to compute normal-vector estimations and from x normal vector estimations to compute the FPFH. However, the numbers x and n are arbitrary and other combinations might describe the target variable better. So I wanted to know if there was a method to evaluate how good a given x and n value are at describing the target variable. I considered the correlation between the features (n,x) and the target variable but I read that this assumes linear combination redundancy. I am using scikit-learn. So basically I have features X(x,n) and a target variable Y. Which x and n, in the feature space X(x,n), describes the target variable, Y, best. I want to do it before the training because when I try to train it with my random forest regressor it takes 3-4 hours and I want to test for more combinations. submitted by /u/Neo-Rushdian [link] [comments]  ( 1 min )
    [Discussion] Training performance evaluation of MindSpore, a home-grown deep learning framework -- by ADSL Lab, CSU
    The article is reproduced from Zhihu, using deepl machine translation, for all enthusiasts to communicate Abstract Deep learning frameworks are the engines and motors for pushing the boundaries of artificial intelligence applications, and good deep learning frameworks can dramatically shorten the cycle of algorithm innovation and validation. In this report, we focus on the newly launched MindSpore framework, which has received a lot of industry attention, and systematically explore its model training speed on GPU clusters and compare it with popular international frameworks. In the evaluation experiments, we choose two classical models, ResNet and BERT-base, to test and analyze their performance with the same algorithm, the same dataset, and the same or similar performance hardware platf…  ( 7 min )
    [D] Why is the diffution model so powerful? but the math behind it is so simple.
    You can see the 200 lines code here: https://nn.labml.ai/diffusion/ddpm/index.html and https://github.com/cloneofsimo/minDiffusion, math is here: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ The algo is smart and simple, but it's generation result seems more incredible than GANs, and its speed is fast, the model size is not too big: https://openai.com/dall-e-2/ , https://huggingface.co/spaces/multimodalart/latentdiffusion, https://www.reddit.com/r/dalle2 So 1st question: why is diffusion model so powerful? Can someone explain it? 2st question: Has anyone used diffusion for NLP? ​ UPDATED: ​ \"A multiverse portal to a new world opening up above Tokyo\" by dalle2 (from r/dalle2) \"A robot painting on a canvas while playing the piano\" by dalle2 (from r/dalle2) ​ \"Mona Lisa in her studio painting Leonardo da Vinci \" by dalle2 (from r/dalle2) \"Science fiction illustration future city in the night | impressionism\" by latentdiffusion ​ \"Science fiction illustration of Beauty and monsters | impressionism\" by latentdiffusion ​ \"a painting of a girl with a fox sitting in a field at sunrise in the style of Claude Monet\" by latentdiffusion submitted by /u/ghosthamlet [link] [comments]  ( 4 min )
    [D] Questions about Intel 12th gen Alder Lake CPUs
    I am looking to build a new PC but have struggled to find the info on how Intel's latest CPUs perform for data science/ML, so if anyone is using one for that purpose and can help with one or more of these questions it would be very helpful! Apologies if these questions should be directed elsewhere. I am planning to use WSL2/Ubuntu but have heard that Intel's thread director isn't implemented well yet in Linux (or Windows 10!), so it doesn't assign tasks properly. Has anyone experienced issues with this firsthand? Assuming the thread director is working, are the e-cores utilised at all in any typical DS workflows? E.g. will they get used with joblib or when training scikit-learn/gbms in parallel? Are the e-cores good enough to handle other stuff like web browsing etc whilst the p-cores are maxed out on model training, or is it still necessary to keep at least one p-core free to avoid crashing the PC? Also I have read that in Windows 11 (where the thread director works best) that the active window/tab could be assigned p-cores as a priority, which isn't very helpful for someone who needs to train models in the background etc, but not sure whether this is actually happening in practice. The consensus from benchmarks/reviews is that the hybrid architecture 'just works' and is superior to AMD right now, but those benchmarks are primarily for use in gaming/video editing. submitted by /u/FightingLikeBeavers [link] [comments]  ( 1 min )
    [N] The new Machine Learning Specialization by DeepLearning.AI and Stanford Online is launching soon! Join the Waitlist.
    We’re thrilled to announce a brand new Machine Learning Specialization, in collaboration with DeepLearning.AI, launching in June on Coursera! Learn essential real-world skills from AI pioneer Andrew Ng, who co-founded Google Brain and Coursera, led AI research at Baidu, and has impacted millions of AI learners. This updated 3-course Specialization will cover the latest machine learning techniques as well as foundational AI concepts that made its predecessor one of the world’s most popular machine learning courses. Join the waitlist! https://preview.redd.it/yujr31t6vku81.png?width=5000&format=png&auto=webp&s=0f4c4ef090bcdc7cfb04ee2c817d766f23c236a6 submitted by /u/Stanford_Online [link] [comments]  ( 1 min )
  • Open

    How Meta's multiverse could prove our universe is a fake
    submitted by /u/estasfuera [link] [comments]
    SingularAgent - Many Methods Make Light Work
    submitted by /u/dantheman333 [link] [comments]
    General AI In Healthcare | Machine Learning For Cardiovascular Disease | Color Night Vision
    submitted by /u/getrich_or_diemining [link] [comments]
    A realistic image AI software
    submitted by /u/Eurokiwiboy [link] [comments]
    Ant colony simulation
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    Is there any free open source AI model available for answering any bible related queries?
    A few years back I developed a very simple app just to show a few bible verses. Though it is a very simple app, it got more than 50K installs without much promotion. So, I am thinking about promoting it. But hesitate to do it as it is very simple app. So, I would like to add some useful feature before start promoting it. I would like to add a feature which will allow the users to ask any question related to bible, and it should be giving relevant answer. I assume that some bible data is open source. Is there any free tutorial available to know about how to implement AI based chat system for answering any bible related queries after training with bible data. Is there any app already providing this feature? submitted by /u/qptbook [link] [comments]  ( 1 min )
    Today, AI is becoming ubiquitous, in and out of the workplace. With artificial intelligence (AI) becoming more powerful, the questions that surround AI ethics are becoming more relevant.
    But can technology be controlled to avoid adverse outcomes? Let's understand how AI will help us to make a better world. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Top Ethical Challenges in AI – The Price of Progress
    submitted by /u/JencyJane [link] [comments]
    Artificial Nightmares: Dr. Strange || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Weekly China AI News: Chinese Prominent AI Lab Plagiarizes Big Model Paper; Microsoft Research Asia Halts Internship Hiring from US-Banned Universities; Beijing Announces New RISC-V Chip Institute
    submitted by /u/trcytony [link] [comments]  ( 1 min )
  • Open

    French palindromes and Morse code
    I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email. Palindromes A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word […] French palindromes and Morse code first appeared on John D. Cook.  ( 2 min )
    Blaschke factors
    Blaschke factors are complex functions with specified zeros inside the unit disk. Given a complex number a with |a| < 1, the Blaschke factor associated with a is the function Notice the semicolon in b(z; a). This is a convention that a few authors follow, and that I wish more would adopt. From a purely […] Blaschke factors first appeared on John D. Cook.  ( 2 min )
  • Open

    AI Application Development Guide for Business Owners
    To start deeply investigating the AI app development process, it’s important to first understand how these projects differ from regular app…  ( 9 min )
  • Open

    Neural Network gets too large and dies
    Hi, I've been working on a project for my computer science class and everything has been working up until the training. I'm following a guide online that has worked fairly well. Whenever I try to train, however, I run into an overflow error and the entire network dies. I'm not sure where to go from here as I've tried a few steps to fix the issue, if anyone could offer up some advice to fixing my problem that would be amazing. submitted by /u/djm710 [link] [comments]  ( 2 min )
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated) -
    submitted by /u/maneesh123456 [link] [comments]
    Question about Sigmoid and Heaviside
    I read a paper and was a little bit confused: In the paper it said: "Imagine you have a two dimensional (binary input) classification (0 or 1) problem and you use Sigmoid as an acitivation function. Since the Sigmoid gives you a real number between 0 and 1, it's not really classification anymore. Therefore you take the input of Sigmoid (y_Sigmoid) and put this into a modified heaviside function H(y-0.5) (so for y_Sigmoid bigger than 0.5, it gives you yHeavi = 1) The decision boundary is given by a straight line w1a1+w2a2+w0=0 and this whole process, it only works with the Sigmoid function as first activation function." The last paragraph confused me. Why can I assume that the decision boundary is exactly that (It's just the "normal decision boundary" for a SLP, why does it work here a also) and why does it work with only Sigmoid Function as first activation function submitted by /u/LawlHeyman [link] [comments]  ( 1 min )
    Starting a neural network
    I want to create a program that can take music i feed it and over time create its own music based on the inputs. I know i have to use a neural network and deep learning algorithms but how do i get started. Thanks. submitted by /u/Saxy-Snark [link] [comments]  ( 2 min )
  • Open

    Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
    A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. However, many recent algorithms reframe RL as a supervised learning problem. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). We find the simplicity of these methods quite appealing. If supervised learning is enough to solve RL problems, then offline RL could become widely accessible and (relatively) easy to implemen…  ( 5 min )
  • Open

    Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra
    Organizations use collaborative document authoring solutions like Salesforce Quip to embed real-time, collaborative documents inside Salesforce records. Quip is Salesforce’s productivity platform that transforms the way enterprises work together, delivering modern collaboration securely and simply across any device. A Quip repository captures invaluable organizational knowledge in the form of collaborative documents and workflows. However, finding […]  ( 6 min )
  • Open

    How Microsoft Power BI Revolutionizes Business
    As cloud-based business intelligence becomes more and more popular in the market, one name has made quite a mark: Power BI. A Microsoft offering, Power BI is an interactive data visualization and analytics tool that promises to revolutionize business. Here are some of its key benefits to help you see how it can do that:… Read More »How Microsoft Power BI Revolutionizes Business The post How Microsoft Power BI Revolutionizes Business appeared first on Data Science Central.  ( 3 min )
    How AI and ML are transforming data quality management?
    Introduction In recent years technology has become prominent, both at work and at home. Machine learning (ML) and Artificial Intelligence (AI) are evolving quickly today. Almost everyone will have some interaction with a form of AI daily. Some common examples include Siri, Google Maps, Netflix, and Social media (Facebook/Snapchat).AI and ML have popularly used buzzwords… Read More »How AI and ML are transforming data quality management? The post How AI and ML are transforming data quality management? appeared first on Data Science Central.  ( 4 min )
    Agile, Agile 2 and Agility, Part II
    In the previous article in this series, we discussed the difference between Agile and business agility and how Agile 2 addresses some of the omissions and failings of traditional Agile.  Both Agile and Agile 2 focus on accelerating digital development; however, the benefits of any Agile approach can be obviated if it is not implemented… Read More »Agile, Agile 2 and Agility, Part II The post Agile, Agile 2 and Agility, Part II appeared first on Data Science Central.  ( 4 min )

  • Open

    Do regulatory data projects really need design-time data lineage? Probably not.
    Your regulatory data project likely has no use case for design-time data lineage. tl/dr Mapping Data Lineage at design time, for its own end, has no regulatory use case or ROI.  Buying a specialist tool to support that mapping has even less ROI.  Regulations see that kind of documentary data lineage as ancillary at best.… Read More »Do regulatory data projects really need design-time data lineage? Probably not. The post Do regulatory data projects really need design-time data lineage? Probably not. appeared first on Data Science Central.  ( 10 min )
    Dark Energy, Dark Data
    During the 1990s, the physics community began to measure the brightness of certain supernovae in a novel way. This new method supported the conclusion Edwin Hubble had first arrived at in 1929 after discovering that galaxies are becoming more and more distant from us: Dark matter and dark energy play a role in why those… Read More »Dark Energy, Dark Data The post Dark Energy, Dark Data appeared first on Data Science Central.  ( 4 min )
    5 Main Benefits of Distributed Cloud Computing
    According to the predictions of Garter, by 2024, distributed cloud computing opportunities will be offered by most cloud vendors on a service basis. With the increasing rush in the cloud space and digitalization of documentation, this industry is bound to grow. Understanding Distributed Cloud Distributed cloud is an innovation to traditional cloud computing. It means… Read More »5 Main Benefits of Distributed Cloud Computing The post 5 Main Benefits of Distributed Cloud Computing appeared first on Data Science Central.  ( 5 min )
  • Open

    Speeding Up AI Algorithms- Inferencing challenges at the edge
    submitted by /u/Chipdoc [link] [comments]
    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    What if You Are a Prototype for the Ultimate Sentient Artificial Intelligence?
    submitted by /u/IndependenceFun4627 [link] [comments]  ( 1 min )
    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    There are so many crappy chatbots, cause people don't pay attention on how it's performing. If you're one of them, here are metrics to keep in mind
    Hi there! Chatbots are not the "set and forget" thing like many other software. If you want to achieve great results with your chatbot, you need to improve it constantly. To know where and what to improve, you need to track and monitor chatbot analytics and the main chatbot metrics. General chatbot metrics Total number of users User satisfaction Accuracy of the chatbot Engagement metrics Active users New users Conversation Length Retention Rate Bounce Rate Flow Completion Rate Conversational analytics Goal Completion Rate (GCR) Fallback Rate Human Takeover Rate * Bonus: Revenue metrics Revenue generated ROI / payback period Here in the article we covered how to calculate each metrics, and you can find needed metrics depending on the industry you working in https://botscrew.com/blog/chatbot-metrics/?utm_source=RedditArtificial&utm_medium=&utm_campaign=&utm_term=&utm_content= submitted by /u/Avandegraund [link] [comments]  ( 1 min )
    Wake-up Call for Science – AI System Develops 40,000 Chemical Weapons in 6 Hours
    submitted by /u/TheCnt23 [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]  ( 2 min )
    which courses are good for complete beginners?
    Hello everyone , can someone recommend me for some good courses to do , I saw some courses on udemy , this one worth it? https://www.udemy.com/course/artificial-intelligence-az/ or I can learn everything on youtube? there are few more on udemy but I don't know how good they are .. is it worth buying one of those or there are better videos on youtube? EDIT : I found another 4 courses : https://www.udemy.com/course/100-days-of-code/ https://www.udemy.com/course/complete-python-bootcamp/ https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/ https://www.udemy.com/course/machinelearning/ Which one of them would you recommend the most? submitted by /u/Edrixor [link] [comments]  ( 1 min )
    AI will make us dumb: [2204.07888] AI, Ageing and Brain-Work Productivity: Technological Change in Professional Japanese Chess
    submitted by /u/kg4jxt [link] [comments]  ( 1 min )
    Any good resources to learn Default Theory?
    I am having a difficult time understanding Default Theory and the various methods e.g Makinson to find the extension of default theories submitted by /u/cocag13996 [link] [comments]
    I know that the voice in this video is made using Replica Studio's engine, but does anyone know which voice exactly was used?
    This I looked through the available ones, not a single one seems to match it. Sorry if this isn't the right sub to ask, but since Replica Studios doesn't have its own sub I don't know where submitted by /u/AxySmarts [link] [comments]  ( 1 min )
    Artificial Nightmares: Schizophrenia || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    [D] Resources for Images Anomaly Detection
    Hello all, I know that there is a lot going on this field. I would like to get started on it, study more.. And as always, I like to start from the basis. Do you have any resource (video, article, book) good to star with? I know there are Autoencoders and Statistical models.. But how to know more, where/how do you keep studying? submitted by /u/bollolo [link] [comments]  ( 1 min )
    [R] Where can I find case studies on different ML projects?
    I am working on my research paper and would like to find resources which show the case studies of ML projects from the beginning to the end, doesn't matter if it failed or succeeded. submitted by /u/mkonu [link] [comments]
    [D] Create Labels for Data created by a GAN
    Hello there! I hope you have a great day! Currently I want to compare how good multiple GANs (Vanilla GAN, WGAN, DCGAN, ...) are for a given use case. Therefore I trained the various GAN versions with data of two different classes (i.e. apple and banana). Now I want to show that data I generate with the Generator can be used to train i.e. a classifier that can distinguish between real images of apples and bananas. Can I somehow create labels for the data I generate with the Generator in a smart way? So that I know that a generated image of the generator should for example be an apple? How do i do that? submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P] Luminide: new optimization Early Ranking achieves higher accuracy AI models
    Luminide introduces a new optimization, called Early Ranking, which makes it easier to build better AI models. Early Ranking achieves the same AI training results with up to 10x less compute – this saves time, reduces costs, and increases model accuracy. Luminide's IDE is a customized version of JupyterLab with integrated AI dev tools. Luminide used Early Ranking to place Top 1% in the CVPR Plant Pathology Kaggle competition. You can read about how we developed our winning model, and you can too, in our new blog post: Better Automation for Higher Accuracy AI Models. Class activation maps give insights into Luminide's winning model. Luminide is a new cloud platform for AI model development. Check out our demo video for a quick overview, or try it for yourself (sign up today and receive 100 hours of free GPU cloud compute). submitted by /u/LuminideInc [link] [comments]  ( 1 min )
    [D] generic discussion on freelance ML engineers
    Hi, reddit. Recently, I'm looking into freelancer career path. Currently, I'm a researcher at a top company. So far, I know there is toptal, upwork, and freelancers. Checked them out, and seems toptal you still end up working for large corporate and mostly end up as full-time contractor which is not really a different or better option than my current work. Freelancers has too many bidders from developing countries. Besides what platform to use, i have more questions in terms of what obstacles we are facing to be freelancer ML engineer? Even though i am in AI and a researcher, but i have never deployed a model in production. Usually a task at big company takes a team or multiple teams to complete the MLOPs lifecycle, how can you do it as a single person? Any sharing of experience would be of great help. submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [R] Looking for AI/ML experts from Southeast Asia to interview for master thesis
    Hello everyone, I am a student from Germany writing my master thesis on Digital Transformation in ASEAN with AI/ML. For my thesis I would like to interview AI/ML experts from the ASEAN region to talk about the digital development of each country, challenges and potentials. (If you are not native there, but you have a work connection or just knowledge about the region and its AI development, I appreciate that as well.) It would be awesome if some of you were open to talk to me. A few sentences are enough, I won't take much of your time. If you want, we can do a video call as well. I will quote you of course. Thank you guys. submitted by /u/BlueLagoon357 [link] [comments]  ( 1 min )
    Dealing with numerically 0 likelihood in probabilistic models [R]
    I'm trying to find literature on solving the following issue: In most probabilistic ML models, we model the joint distribution over a set of random variables, p(x1, ..., xN). If N is very large (e.g. 100, 500, or even 1000), then regardless of how you model this, the distribution's highest point of density is still quite tiny. E.g. if you consider an isotropic multivariate gaussian of 100 dimensions, the highest point of density will be somewhere in the neighbourhood of 1.6e-40. So when it comes time to evaluate log likelihood for a model like this, the probability is numerically 0, so the log probability goes to negative infinity. ​ Is there work around solving these kinds of issues? I.e. by constraining the model in some way, or scaling model output, etc? I've done some googling, but am having a hard time finding papers on the subject. Not even sure what to call the problem... Curse of dimensionality in PGMs? ​ Any recommendations of papers / talks / etc is greatly appreciated! submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [R][P] GAN-Control: Explicitly Controllable GANs + Gradio Web Demo
    ​ https://i.redd.it/v61jw1fekiu81.gif Abstract: We present a framework for training GANs with explicit control over generated facial images. We are able to control the generated image by settings exact attributes such as age, pose, expression, etc. Most approaches for manipulating GAN-generated images achieve partial control by leveraging the latent space disentanglement properties, obtained implicitly after standard GAN training. Such methods are able to change the relative intensity of certain attributes, but not explicitly set their values. Recently proposed methods, designed for explicit control over human faces, harness morphable 3D face models (3DMM) to allow fine-grained control capabilities in GANs. Unlike these methods, our control is not constrained to 3DMM parameters and is extendable beyond the domain of human faces. Using contrastive learning, we obtain GANs with an explicitly disentangled latent space. This disentanglement is utilized to train control-encoders mapping human-interpretable inputs to suitable latent vectors, thus allowing explicit control. In the domain of human faces we demonstrate control over identity, age, pose, expression, hair color and illumination. We also demonstrate control capabilities of our framework in the domains of painted portraits and dog image generation. We demonstrate that our approach achieves state-of-the-art performance both qualitatively and quantitatively. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Who funds the leading conferences in the field?
    I know that the publishers of the leading journals are mostly for-profit organization, that is weird because as researchers in the field we really “volunteer” for a free peer review or even pay to publish papers and read papers. On the other hand, i wasnt able to find information about the funding and profit goals of the leading conferences. Take NeuroIPS for example, i found that it is organized by “NeurIPS Foundation” but what exactly is this foundation - i couldn’t find any information about this subject. My point is, if the conferences are non-profit, sounds like they should be preferred over funding a for-profit organizations. submitted by /u/Careful_Winner_2335 [link] [comments]  ( 1 min )
    [D] NLP has HuggingFace, what does Computer Vision have?
    I've been writing tutorials with Pinferencia and HuggingFace. HuggingFace is quite handy and easy to use. I want to write some tutorial about computer vision afterwards. Is there anything similar in Computer vision area? submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [D] Why no paper in Speech Emotion Recognition train on multiple datasets ?
    I took a look at multiple of them and I was curious why they seemed to benchmark on multiple datasets but for the training, they restrained themselves to only 1 for training instead of merging them. From that they get good scores on the one they trained on, but bad ones for the rest. submitted by /u/raysamram [link] [comments]  ( 1 min )
    [P] SparseServer.UI : A UI to test performance of Sparse Transformers
    You can now load multiple transformers (each model has a unique sparsification recipe) on top of the DeepSparse server behind Streamlit, and it's open-source. This was battle tested on a 16GB of RAM with only 4 core CPU virtual machine. These compute requirements are enough to load up to 19 sparse BERT models in memory and compare their performance on question answering (P.S. they are really fast on just CPUs). 💻code: https://github.com/neuralmagic/deepsparse/tree/main/examples/sparseserver-ui submitted by /u/Quantum_Stat [link] [comments]  ( 1 min )
    [Research] Learning with Signatures
    This paper reports "results on AFHQ dataset, Four Shapes, MNIST and CIFAR10 achieving 100% accuracy on all tasks." The authors used few-shot classification "by comparing each test sample (after optional augmentation and computation of the element-wise mean) against a representative element-wise mean signature computed by averaging the signatures of a given number of train samples." What are your thoughts on this? Learning with Signatures - https://arxiv.org/abs/2204.07953 submitted by /u/Marmadelov [link] [comments]  ( 1 min )
    [Project] [Research] Simple Speech Recognition System
    Github - Bangla Spoken Number Recognition Dataset - Our custom dataset on Bangla Numerals Publications - Though its on (0-9) digits We have created a simple speech recognition system for recognizing Bangla numerals from '০-৯৯'(0-99). In this project, audio samples from different genders, age groups, and dialects of Bangladeshi people were used to create a speech dataset of spoken numbers from '০-৯৯'(0-99). The raw speech data is subjected to various audio augmentation techniques such as time shift, speed tuning, background noise mixing, and volume tuning. Then, to extract meaningful features from the data, Mel Frequency Cepstrum Coefficients (MFCCs) are used. We have used, Convolutional Neural Networks (CNNs), to develop a Bangla number recognition system. The proposed method recognizes '০-৯৯'(0-99) Bangla spoken numbers with 89.61% accuracy across the entire dataset. The model’s effectiveness was also tested using 10-fold cross-validation, with 89.74% accuracy for recognizing '০-৯৯'(0-99) Bangla spoken numbers across the entire dataset. I Hope, this work will help you in some way. :) submitted by /u/PIASR0Y [link] [comments]  ( 1 min )
    [P] Improving mulitclass classification accuracy with Jain's Fairness Index
    This is a light implementation of the idea in the paper Leveraging Uncertainties in Softmax Decision-Making Models for Low-Power IoT Devices. Instead of finding uncertainties I have added Jain's Fairness Index as a addition to the loss function. Gist: https://gist.github.com/Gananath/8d167384da7d3bc078650c73fab1a8dd submitted by /u/gananath [link] [comments]
    [D] Are workshop papers considered "final publications"?
    Specifically, I'm talking about workshops of major conferences (NeurIPS, ICLR, ICML, etc.). If I submit a paper and it gets accepted, is that workshop paper a "final publication"? Or would most people expect the project to continue being developed into a slightly larger/longer paper for submission to the main stream of a conference? And if so, does publishing the earlier workshop paper tend to hinder or harm the later conference submission? I recognise there's a variety of workshops, and perhaps each have different expectations or norms. I'm wondering, from my outsider's perspective, how can I tell? For example, I have been thinking about submitting to one of these ICML workshops: https://icml-compbio.github.io/ or https://www.tagds.com/workshops/tag-in-machine-learning. Is there an easy way to tell whether either or both of these are "final publication" venues or not? submitted by /u/tfburns [link] [comments]  ( 2 min )
    [R] Maximum likelihood estimation can fail due to "Manifold Overfitting"
    arXiv: https://arxiv.org/abs/2204.07172 This paper out today seems to make the bold claim that maximum likelihood estimation is not a well-posed training objective in deep generative modelling. The manifold hypothesis says that observed high-dimensional data clusters around low-dimensional manifolds, but maximum likelihood methods (e.g. VAE, normalizing flows) learn high-dimensional densities. The paper argues that the mismatch between dimensionalities will lead to a problem called "manifold overfitting". Models are able to maximize likelihood in high-dimensions by sending the density to infinity around the low-dimensional manifold, but they can do this while completely ignoring the distribution of data on the manifold. So in other words, high capacity models will learn the data manifold…  ( 5 min )
  • Open

    "Reinforcement Learning with Action-Free Pre-Training from Videos", Seo et al 2022
    submitted by /u/gwern [link] [comments]
    "Inferring Rewards from Language in Context", Lin et al 202
    submitted by /u/gwern [link] [comments]
    Bandit problems as sequential decision problems
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process (need to model the state carefully). An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book (https://castlelab.princeton.edu/RLSO/) for a complete summary of the four classes of policies for pure learning problems (aka bandit problems). Note that Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Getting started with UAV/drone control
    Hi, is it currently possible to train a UAV and implement the policy it in real-life? I understand there are different environments for training, e.g. AirSim, GymFC, and others. However, the interesting part for me is the link to the real world: Is there a way to directly implement any learned policy on a real drone, e.g. a commercially available quad-copter? Which UAV would support such a functionality? I'd love to get started on training drones for RL purposes (search and rescue, etc), but if there is no way to test it in real-life then this would be disappointing. submitted by /u/FrankTheThanks [link] [comments]  ( 1 min )
    Question about Expected Sarsa for prediction vs control
    I am having a hard time figuring out what makes the difference between Expected Sarsa for prediction vs for control. For off-policy Expected Sarsa I believe it's possible to use one epsilon value for a target policy that is epsilon-greedy and another epsilon value for a behaviour policy that is epsilon-greedy. The target policy would be used within the expected value calculation in the update of Q(S,A), the action value function, and the behaviour policy would be used to choose actions from the current state. But I'm not sure how to differentiate between the control version of the algorithm compared to the prediction version though. I think prediction usually finds the state-value function but I know that on-line Sarsa for prediction uses Q(S,A) so I'm not sure how to determine the difference between prediction and control algorithms. submitted by /u/lifelifebalance [link] [comments]  ( 1 min )
    exploration strategies in discrete action spaces
    Hello there, I am working on missile command game, and as a baseline I mostly use rllib/ppo. The algorithm never converges, I suspect it is because of the lack of exploration. Since the timesteps are small, the target usually oscillates around center of the screen, it is impossible to explore to go near the border and then explore to fire (to counter incoming missile). What methods should I try? Moreover, I have already done reward scaling and frame staking. Any suggestions regarding solving this game is much appreciated. Last question, do you now similar (and common) environments that is solved, maybe solutions show the path to follow.Thank you :) submitted by /u/Street_Excitement_14 [link] [comments]  ( 1 min )
    Confusion of hyperparameters in ppo
    I'm reading the ppo paper https://arxiv.org/abs/1707.06347 and I'm confusing about the hyperparameters in table 4, Log stdev. of action distribution | LinearAnneal(-0.7, -1.6). Best to my knowledge, under the continuous setting, the policy will output mean and std, so why the stdev of action distribution is given as a hyperparameter, and also what is LinearAnneal in detail. submitted by /u/StrawberryTemporary7 [link] [comments]  ( 1 min )
    Need help about categorical dqn
    I dk how the projection of TZ to match Z work and I also dont understand the formula? can someone do step by step calculation to demo? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
  • Open

    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    This is a long shot, but does anyone remember...
    Hi, this is a very long shot. I have been trying to remember the name of a science TV show which aired in the UK back in the 90's. It focussed on Neural Networks and gave some brilliant examples of environmental sensing. There was also a section showing a simple voice synthesiser which "babbled" like a child. I thought it may have been an "Horizon" show, however, I have been through the list of shows from that time and none appear to be right. If anyone has a memory of this show please let me know. One of the visuals I remember was a plastic skull with an LED matrix inside showing patterns. Obviously this was just some smoke and mirrors, however, it may trigger a memory. I'm trying to recall something from best part of 30 years ago.. submitted by /u/_m0xya_ [link] [comments]  ( 1 min )
  • Open

    10 seats remaining | A series of live ML strategy workshops
    Sponsored Post Unlike traditional online courses, Foster Provost’s workshops will give you the chance to engage live with a world-class […] The post 10 seats remaining | A series of live ML strategy workshops appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Learning to Prompt for Continual Learning
    Posted by Zifeng Wang, Student Researcher, and Zizhao Zhang, Software Engineer, Google Research Supervised learning is a common approach to machine learning (ML) in which the model is trained using data that is labeled appropriately for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) data, where all training examples are sampled from a fixed set of classes, and the model has access to these examples throughout the entire training phase. In contrast, continual learning tackles the problem of training a single model on changing data distributions where different classification tasks are presented sequentially. This is particularly important, for example, to enable autonomous agents to process and interpret continuous streams of informati…  ( 7 min )
  • Open

    Integrate ServiceNow with Amazon Lex chatbot for ticket processing
    Conversational interfaces (or chatbots) can provide an intuitive interface for processes such as creating and monitoring tickets. Let’s consider a situation in which a recent hire on your team is required to cut tickets for office equipment. To do so, they have to interact with a ticketing software that the organization uses. This often requires […]  ( 10 min )
  • Open

    Inversion in a circle
    Inversion in the unit circle is a way of turning the circle inside-out. Everything that was inside the circle goes outside the circle, and everything that was outside the circle comes in. Not only is the disk turned inside-out, the same thing happens along each ray going out from the origin. Points on that ray […] Inversion in a circle first appeared on John D. Cook.  ( 2 min )
  • Open

    Don’t let data drift derail edge compute machine learning models
    Edge computing has come of age, with deployments enabling many applications that process data from IoT sensors and cameras. In 2017, we identified the symbiotic relationship between edge computing and video analytics in an article, noting that live video analytics is the “killer app” for edge computing. Edge devices come in various shapes and sizes […] The post Don’t let data drift derail edge compute machine learning models appeared first on Microsoft Research.  ( 5 min )
  • Open

    A Quick Guide To Find The Right Minds For Annotation Is So Famous, But Why?
    Shared duties have always been the most critical component of every successful organization, regardless of its nature or size. When it…  ( 4 min )
  • Open

    Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques
    Creating content is no longer tethered to using paint and stone as mediums, nor being in massive studios. Visual art can now be created anywhere, anytime. But being creative is still challenging and time-consuming. NVIDIA is making artistic workflows easier and faster by giving creators tools that enable them to remain in their flow state. Read article > The post Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques appeared first on NVIDIA Blog.  ( 4 min )

  • Open

    whats your hopes and worry about future humaniod Artificial intelligence coming soon?
    submitted by /u/Upset_Force66 [link] [comments]
    AI Dream 36 - Psychedelic Special (4K 40Mbit Test)
    submitted by /u/LordPewPew777 [link] [comments]
    AI Startups and the Hunt for Tech Talent in Vietnam
    submitted by /u/regalalgorithm [link] [comments]
    We don't have echolocation
    submitted by /u/tezdhar [link] [comments]
    These 3-Michelin-starred plates were invented by AI. The food doesn’t even exist
    submitted by /u/jonfla [link] [comments]
    Getting in shape while homeworking by force locking the screen and using blazepose pose estimation to detect pushups to unlock it again.
    submitted by /u/ThePyCoder [link] [comments]
    Last Week in AI: AI chip startup funding doubled in the last 5 years, new AI applications in hospitals and restaurants, Cruise robotaxi pulled over by police in SF, and more!
    https://lastweekin.ai/p/163?s=w submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Youtubers create a completely AI "influencer."
    submitted by /u/savetheattack [link] [comments]
    FOMO is a TinyML neural network for real-time object detection
    submitted by /u/bendee983 [link] [comments]
    An online course with an AI tutor achieves a significantly higher completion rate than traditional online courses thanks to a personalized learning experience.
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Protein Folding Neural Networks (e.g RoseTTAFold) Are Not Robust
    submitted by /u/qptbook [link] [comments]
    Witch of the Barthe
    submitted by /u/Hacknaut [link] [comments]
    Why is it called tensorflow and not matrixflow?
    Hello, I'm MB. A very nice and polite guy. Why is it called tensorflow and not matrixflow? AI is all about matrix multiplications, right? So why use the word tensor instead? I know what a tensor is, kind of. But isn't AI about matrix multiplications primarily rather than tensor multiplications. ELI5 please. submitted by /u/MountBlanc [link] [comments]  ( 2 min )
    Society
    submitted by /u/booksmoothie [link] [comments]
    Bioinspired multisensory neural network with crossmodal integration and recognition
    submitted by /u/booksmoothie [link] [comments]
    Realistic animal movement
    I am working on a robotic pet that has lots of movement capability but is simply scripted and will unnaturally jump between movement sets without considering the current movement. What branch of AI should I look into leaning about? Currently I use mostly python for high level and C for microcontrollers. submitted by /u/uMinded [link] [comments]
  • Open

    A3C vs federated learning?
    Hi, I see this question was asked before but I am still not convinced there is a difference between the two. How is asynchronous distributed RL (A3C) and federated learning different? It seems like the basic idea behind them is the same— the agents train in their own environments and only share gradients with the server. Is the difference only in terms of the domain they are applied in? Is it just ML vs RL? submitted by /u/uneasy_daisy [link] [comments]  ( 1 min )
    Can polyak averaging neural networks lead to numerical instability?
    In Soft Actor Critic several Q networks are used. Target Q networks are gradually updated to match other Q networks. See step 15 here: https://spinningup.openai.com/en/latest/algorithms/sac.html#pseudocode I've heard this called polyak averaging. Let's say we have two weights from two neural networks: W1 from one network, and W2 is the corresponding weight from the other network. Polyak averaging averages these weights as follows: W_average = W1 * p + W2 * (1-p) When p is 0.5, it's a evenly weighted average. If p is high, then W1 is weighted more heavily than W2, etc. My question is: Does this method of averaging weights lead to numerically unstable neural networks? This technique is often used to gradually transform one neural network into another on a weight by weight basis, but there is no guarantee that all intermediate neural networks are well behaved (at least, none that I'm aware of). Whereas, gradient descent with small enough step sizes should, theoretically, keep a neural network well behaved, I think those same theoretical guarantees apply to polyak averaging neural networks. What do you think? submitted by /u/Buttons840 [link] [comments]  ( 2 min )
  • Open

    [D] Word Meaning Dictionary Dataset
    Hey all! So I intend to make an application that, very naively speaking, outputs synonyms of a given word regardless of context (like if word1 is "bank", the model should output both "money" and "river", and the order does not matter). For this, I intend to use a Doc2Vec type of classifier, where the meanings of each word can serve as a document, and then similar words can easily be returned using a cosine similarity function. I chose this over a classic Word2Vec as this will be able to predict uncommon words (which blimey the English language has a lot of) which would otherwise be processed as tokens. To this end, I am searching for a suitable dataset. Any ideas? submitted by /u/GrammarPaparazzi [link] [comments]  ( 1 min )
    [D] Is there a way to use a series of videos as the predictor variable for prediction/regression?
    This is the problem area I am working with: I have a series of videos taken at different times, and each video is paired with a physical variable. The videos contain information that correlates with the physical variable. What we want to do is use the information encoded within each video to build a correlation model with the physical quantity, and thereafter use new videos to predict the physical quantity. (We want to avoid the route of video -> CNN -> extract parameters -> build model with parameters. Instead, we want to directly go from the videos to the model without separately extracting parameters.) So, in a way, I want to use a series of videos as a time series data set. Is there a way to do this? What should be the starting point for my research into this? Thanks in advance! I am not an expert with this area at all, and would greatly appreciate guidance from the community. submitted by /u/besse [link] [comments]  ( 1 min )
    [P] Blog post + open-source PyTorch implementation of DeepMind's SIMONe (unsupervised scene decomposition)
    Hi all! My team recently reproduced and published a PyTorch implementation of the paper SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition. Our blog post walks through the code and provides a detailed explanation of the architecture they use in order to perform object segmentation on videos in a fully self-supervised manner. Hope this is helpful/interesting to others! submitted by /u/ai_ellie [link] [comments]  ( 1 min )
    [D] AutoRF vs SinNeRF
    Both approaches seem to be able to render complex scenes from a single view, without the need for explicit priors or pretrained feature extractors. Conveniently, AutoRF doesn't mention SinNeRF. What are the similarities and differences among the two approaches? DISCLAIMER - I'm not a NeRF expert. My limited understanding of it is that we train a small MLP to regress the radiance field for a scene, i.e., to predict emitted radiance at a point (x,y,z) in the viewing direction (θ, φ). Once we have the radiance field, we can use some rendering engine to render a 2D view from the 3D field and the camera parameters. EDIT: I just realized that I didn't link the papers, how silly of me. Here they are: SinNeRF: https://arxiv.org/abs/2204.00928 AutoRF: https://arxiv.org/abs/2204.03593 ​ ​ submitted by /u/Best-Neat-9439 [link] [comments]  ( 1 min )
    [P] Evaluating automatic paraphrasing via BLEU, LaBSE, Perplexity and Jaccard similarity index - how we do it for Linguix Paraphraser 2.0
    Hey everyone! Our NLP team, led by our expert Daria, has recently released a new AI-based paraphrasing feature – Linguix Paraphraser 2.0. To measure its quality, we use four important metrics: BLEU, Jaccard similarity index, LaBSE and Perplexity. Performance stats: BLEU, which is used for measuring the quality of machine translation. The lower it is for rephrase task, the better. Right now, Linguix Paraphraser 2.0 has the BLEU metric of 0.47 (previous iteration had 0.65). So, we can say that our paraphraser is now smarter, it uses more words to rewrite the sentence, but the overall idea of the content is still preserved. Jaccard similarity index is used to measure the likeness of x and y objects. The same as with BLEU, the lower the index for the task, the better. Our current metric is 0.45 compared to 0.51 for the previous iteration. LaBSE metric is used to measure the semantic similarity of two sentences. It translates text into vectors so that vectors of texts close in meaning are geometrically close to each other. The higher the metric, the better. The new model has LaBSE similarity slightly less than the previous model: 0.80 vs 0.93, which is normal and correct, because the model generates a variety of variants using other words, but keeping the meaning of the source text in the target. Perplexity is used to ensure the rewritten content sounds natural (lower perplexity is better). The naturalness of the rewrites generated by our new paraphraser is much better than before: 0.26 vs 4.99 for the prior version. ​ https://i.redd.it/iaaf7o2iibu81.gif As such, for Linguix Paraphraser 2.0 we were able to improve the quality of the rephrased content, while keeping the text meaning at the same level. P.S. Daria is somewhat shy, so I asked her to share the update here on her behalf. Anyway she'll be pleased to see some feedback! submitted by /u/alexlash [link] [comments]  ( 1 min )
    [R] [P] Slideflow: a deep learning framework for digital histology
    Hi all - I'm an applied ML researcher working in an oncology research lab at U Chicago, using digital slides of patient's tumors for tumor classification, prognostication, and treatment response prediction. I'm really excited to share with the community the deep learning tools we've been using, and I'm hoping for any feedback you might have (or direction if you think there's a community or subreddit this might be better suited for). After years of development, we've released our open-source deep learning framework for digital histology, Slideflow (https://github.com/jamesdolezal/slideflow). It has flexible and highly optimized whole-slide image processing, support for a wide variety of existing and custom architectures (with continuous, categorical, or time-series outcomes), real-time digital stain normalization, a number of explainability tools, and integrated uncertainty quantification. It's compatible with both Tensorflow and PyTorch, available on PyPI and DockerHub, and comes with good documentation (https://slideflow.dev/). We've tried out a number of alternative frameworks over the years, and I think the ease of use, flexibility, and performance optimizations set it apart from other repos you'll find on GitHub. We have a handful of local collaborators who are using Slideflow, but I'm hoping to expand our reach and find people in similar fields who are interested in collaborating for ongoing open-source development. I've tried looked for subs relating specifically to computational pathology / digital histology, and haven't found a good community yet - anyone have ideas for how to get connected with like-minded people working in the same field? submitted by /u/shawarma_bees [link] [comments]  ( 2 min )
    [D] Anyone using named tensors or a tensor annotation lib productively?
    It seems like there have been some options out for a while now - e.g. native pytorch named tensors, tsalib, torchtyping - yet I haven't really seen them discussed or used in any code I've come across. Just wondering if anyone has surveyed them recently and is using them. In particular tsalib's warp string syntax for transformations looks really interesting. submitted by /u/patniemeyer [link] [comments]  ( 1 min )
    [D] Are there any analog A.I. computing chips on the retail market yet?
    If so, where to buy them? (for example: I red that mythic has been collecting funding in mid-2021, but I dont know if they are for sale anywhere). submitted by /u/GerritTheBerrit [link] [comments]  ( 2 min )
    [N][R][P] High fidelity 3D face reconstruction from monocular image
    FaceNext is an open source PyTorch library for high fidelity 3D face reconstruction from single/multiple RGB image(s). github.com/abdallahdib/NextFace ​ https://reddit.com/link/u6e7cd/video/ixg0wlzirau81/player submitted by /u/Abd_dib [link] [comments]
    [D] Which keywords describe my task?
    Hey all, I have received a task in an area I am unfamiliar with and need a little help finding suitable papers, so I am looking for keywords. To illustrate the goal, let's say you have 10000 screws (which can be of the same model) and you want to be able to recognize/match each one. You want new screws to be added all the time, so you also want the case that the object could previously be unknown when performing the match. The goal is to develop a capturing system that produces suitable images and to find an architecture/algorithm that is as robust as possible. The object images should be invariant to illumination, rotation and translation during acquisition. It should be a kind of barcode/hash without any additional symbol, based only on the structure of the object. Is there a name for such a task? I think it is not really a classification in the classical sense. I guess it might be just a clever way of finding suitable features for each individual object structure and suitable distance function. Sorry for the long post, I appreciate any help. submitted by /u/Temporary_Lab769 [link] [comments]  ( 1 min )
    [D] Including outer objects in RNN / CNN
    Hello there, Which layer or structure would you append to existing machine learning architectures like yolov5 in order to not only detect the specific object, but also the object which it is part of? Lets say there are xray images of laptops: The laptop itself will be detected and also something like the hard drive or battery inside of it. Is it possible to make the CNN/RNN aware of the fact that the hard drive or battery is inside the Laptop? Hope someone can tell what i mean. Regards David submitted by /u/rohrivibes [link] [comments]  ( 1 min )
    [D] PhD in knowledge representation and reasoning for autonomous agent: research landscape
    I have been offered a PhD in domain of knowledge representation and reasoning for autonomous agents. Goal is to use represent textual rules and world knowledge and then use those represented knowledge for reasoning, so that motion of autonomous agent can be predicted. I have question regarding the current landscape of knowledge representation and reasoning. I see more and more work in data focused model and old Logic and associated paths fading out. Phd project problem itself looks interesting as it focus on work where there will be less need of data and can plan motion in unseen scenarios. But I am concerned about the future career prospective in this domain where this problem is tackled by knowledge representation and reasoning. As I can see there is less and less funding in this domain. What is your take on future landscape of research direction in this domain? submitted by /u/human_treadstone [link] [comments]  ( 2 min )
    [P] My blog on ML model evaluation (Bayes optimal decisions, ROC curve, LLR calibration)
    I have published 3 articles about ML model evaluation on my personal blog. Just finished the 3 installment, so I am keen to share and get some feedback. I cover frameworks traditionally used in ML like ROC curves, but from a Bayes decision perspective, which I have been struggling to find in textbooks/tutorials. The 3rd part is about the evaluation of log-likelihood calibrated models. Hope you will find it interesting/useful! https://mkffl.github.io/2021/10/18/Decisions-Part-1.html https://mkffl.github.io/2021/10/28/Decisions-Part-2.html https://mkffl.github.io/2022/03/02/Decisions-Part-3.html And the underlying code for reproducibility https://github.com/mkffl/decisions submitted by /u/mkffl [link] [comments]  ( 1 min )
    [P] ormb: Docker for Your Models, Help You Manage Models Better
    github.com/kleveross/ormb ormb helps you manage your Machine Learning/Deep Learning models with docker container image registry. It makes your models easy to create, version, share and publish. ``` Save the model in local cache first $ ormb save gaocegege/fashion_model:v1 ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: saved Push the model from local cache to remote registry $ ormb push gaocegege/fashion_model:v1 The push refers to repository [gaocegege/fashion_model] ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: pushed to remote (1 layer, 162.1 KiB total) Pull the model from remote registry to local cache $ ormb pull gaocegege/fashion_model:v1 v1: Pulling from gaocegege/fashion_model ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB Status: Downloaded newer model for gaocegege/fashion_model:v1 Export the model from local cache to current directory $ ormb export gaocegege/fashion_model:v1 ref: localhost/gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB View the local file directory $ tree examples/SavedModel-fashion examples/SavedModel-fashion ├── model │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index ├── ormbfile.yaml └── training-serving.ipynb 2 directories, 5 files ``` submitted by /u/gaocegege [link] [comments]  ( 1 min )
    [D] Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection
    Hi, I have just published my latest medium article. Anomalies are widespread when it comes to working on data. They become vital in time series. So, It is crucial to propose efficient methods to detect and deal with them. This article illustrates a state-of-the-art model called DGHL for anomaly detection. DGHL includes a ConvNet as a Generator and instead of encoding it maximizes the likelihood with the Alternating Back-Propagation algorithms. https://rezayazdanfar.medium.com/deep-generative-model-with-hierarchical-latent-factors-for-time-series-anomaly-detection-8d6eaebad8bc submitted by /u/rezayazdanfar [link] [comments]  ( 1 min )
    [P] app to play with latent diffusion models
    just published “geni”, a new minimal app that uses Latent Diffusion Models. It will not produce DALL-E-ish results but it’s fast and great for playing with prompt engineering. Also, it’s free. would love to have the community playing with it. check it out here: https://geni.vercel.app submitted by /u/viccpopa [link] [comments]  ( 1 min )
    [R] VQ-Flows: Vector Quantized Local Normalizing Flows
    arXiV: https://arxiv.org/abs/2203.11556   Summary: We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.​   GitHub: Coming Soon Author here, happy to answer any questions. submitted by /u/tshrjn [link] [comments]  ( 1 min )
  • Open

    Does anyone have a guess as to why my network isn’t working? (more info in comments)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    Sponsored Post By Luis Bermudez This blog walks through a process for experimenting with hyperparameters, training algorithms and other parameters […] The post Guide to Iteratively Tuning GNNs appeared first on Machine Learning Mastery.  ( 6 min )
    Managing Data for Machine Learning Project
    Big data, labeled data, noisy data. Machine learning projects all need to look at data. Data is a critical aspect […] The post Managing Data for Machine Learning Project appeared first on Machine Learning Mastery.  ( 30 min )
  • Open

    7 Tips for making your code more ‘pythonic’ and elegant
    7 use-cases where you can make your python code more nifty, concise and elegant — without compromising readability. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    AI and Healthcare: AI as a Triaging Tool for Healthcare
    Healthcare offers one of the biggest areas where AI could impact people. AI in healthcare is already widespread but is expected to grow even further. The global artificial intelligence in healthcare market size was valued at USD 10.4 billion in 2021. It is expected to expand at a compound annual growth rate (CAGR) of 38.4%… Read More »AI and Healthcare: AI as a Triaging Tool for Healthcare The post AI and Healthcare: AI as a Triaging Tool for Healthcare appeared first on Data Science Central.  ( 3 min )
    Fallacy of Becoming Data-driven – Part 2: Cultural Transformation
    In my first blog of the series “Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed”, I preached about the critical importance of reframing the conversion away from data-driven to becoming value-obsessed. Instead of focusing on becoming value-driven, organizations need to focus on how to uncover the customer, product, service, and operational insights buried in… Read More »Fallacy of Becoming Data-driven – Part 2: Cultural Transformation The post Fallacy of Becoming Data-driven – Part 2: Cultural Transformation appeared first on Data Science Central.  ( 5 min )
    A Glossary of Knowledge Graph Terms
    As with many fields, knowledge graphs boast a wide array of specialized terms. This guide provides a handy reference to these concepts. Resource Description Framework (RDF) The Resource Description Framework (or RDF) is a conceptual framework established in the early 2000s by the World Wide Web Consortium for describing sets of interrelated assertions. RDF breaks… Read More »A Glossary of Knowledge Graph Terms The post A Glossary of Knowledge Graph Terms appeared first on Data Science Central.  ( 10 min )
  • Open

    Auto-Gait: Automatic Ataxia Risk Assessment with Computer Vision on Gait Task Videos. (arXiv:2203.08215v2 [cs.CV] UPDATED)
    In this paper, we investigated whether we can 1) detect participants with ataxia-specific gait characteristics (risk-prediction), and 2) assess severity of ataxia from gait (severity-assessment) using computer vision. We created a dataset of 155 videos from 89 participants, 24 controls and 65 diagnosed with (or are pre-manifest) spinocerebellar ataxias (SCAs), performing the gait task of the Scale for the Assessment and Rating of Ataxia (SARA) from 11 medical sites located in 8 different states across the United States. We develop a computer vision pipeline to detect, track, and separate out the participants from their surroundings and construct several features from their body pose coordinates to capture gait characteristics like step width, step length, swing, stability, speed, etc. Our risk-prediction model achieves 83.06% accuracy and an 80.23% F1 score. Similarly, our severity-assessment model achieves a mean absolute error (MAE) score of 0.6225 and a Pearson's correlation coefficient score of 0.7268. Our models still performed competitively when evaluated on data from sites not used during training. Furthermore, through feature importance analysis, we found that our models associate wider steps, decreased walking speed, and increased instability with greater ataxia severity, which is consistent with previously established clinical knowledge. Our models create possibilities for remote ataxia assessment in non-clinical settings in the future, which could significantly improve accessibility of ataxia care. Furthermore, our underlying dataset was assembled from a geographically diverse cohort, highlighting its potential to further increase equity. The code used in this study is open to the public, and the anonymized body pose landmark dataset is also available upon request.
    Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer. (arXiv:2204.07537v1 [cs.CV])
    Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.  ( 2 min )
    Effects of Multi-Aspect Online Reviews with Unobserved Confounders: Estimation and Implication. (arXiv:2110.01746v2 [cs.LG] UPDATED)
    Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely consider the effects of single numerical causes and ignore different effects of multiple aspects -- e.g., Food, Service -- embedded in the textual reviews; they assume the absence of hidden confounders in observational studies, e.g., consumers' personal preferences; and they overlook the indirect effects of numerical causes that can potentially cancel out the effect of textual reviews on business revenue. We thereby propose an alternative perspective to this single-cause-based effect estimation of online reviews: in the presence of hidden confounders, we consider multi-aspect textual reviews, particularly, their total effects on business revenue and direct effects with the numerical cause -- ratings -- being the mediator. We draw on recent advances in machine learning and causal inference to together estimate the hidden confounders and causal effects. We present empirical evaluations using real-world examples to discuss the importance and implications of differentiating the multi-aspect effects in strategizing business operations.  ( 2 min )
    Multi-domain Integrative Swin Transformer network for Sparse-View Tomographic Reconstruction. (arXiv:2111.14831v7 [eess.IV] UPDATED)
    Decreasing projection views to lower X-ray radiation dose usually leads to severe streak artifacts. To improve image quality from sparse-view data, a Multi-domain Integrative Swin Transformer network (MIST-net) was developed in this article. First, MIST-net incorporated lavish domain features from data, residual-data, image, and residual-image using flexible network architectures, where residual-data and residual-image sub-network was considered as data consistency module to eliminate interpolation and reconstruction errors. Second, a trainable edge enhancement filter was incorporated to detect and protect image edges. Third, a high-quality reconstruction Swin transformer (i.e., Recformer) was designed to capture image global features. The experiment results on numerical and real cardiac clinical datasets with 48-views demonstrated that our proposed MIST-net provided better image quality with more small features and sharp edges than other competitors.
    GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning. (arXiv:2111.11210v3 [cs.LG] UPDATED)
    Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a "coreset" that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy.
    Learning to Accelerate by the Methods of Step-size Planning. (arXiv:2204.01705v3 [cs.LG] UPDATED)
    Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.  ( 2 min )
    Transfer Learning for Instance Segmentation of Waste Bottles using Mask R-CNN Algorithm. (arXiv:2204.07437v1 [cs.CV])
    This paper proposes a methodological approach with a transfer learning scheme for plastic waste bottle detection and instance segmentation using the \textit{mask region proposal convolutional neural network} (Mask R-CNN). Plastic bottles constitute one of the major pollutants posing a serious threat to the environment both in oceans and on land. The automated identification and segregation of bottles can facilitate plastic waste recycling. We prepare a custom-made dataset of 192 bottle images with pixel-by pixel-polygon annotation for the automatic segmentation task. The proposed transfer learning scheme makes use of a Mask R-CNN model pre-trained on the Microsoft COCO dataset. We present a comprehensive scheme for fine-tuning the base pre-trained Mask-RCNN model on our custom dataset. Our final fine-tuned model has achieved 59.4 \textit{mean average precision} (mAP), which corresponds to the MS COCO metric. The results indicate a promising application of deep learning for detecting waste bottles.  ( 2 min )
    Big-means: Less is More for K-means Clustering. (arXiv:2204.07485v1 [cs.LG])
    K-means clustering plays a vital role in data mining. However, its performance drastically drops when applied to huge amounts of data. We propose a new heuristic that is built on the basis of regular K-means for faster and more accurate big data clustering using the "less is more" and MSSC decomposition approaches. The main advantage of the proposed algorithm is that it naturally turns the K-means local search into global one through the process of decomposition of the MSSC problem. On one hand, decomposition of the MSSC problem into smaller subproblems reduces the computational complexity and allows for their parallel processing. On the other hand, the MSSC decomposition provides a new method for the natural data-driven shaking of the incumbent solution while introducing a new neighborhood structure for the solution of the MSSC problem. This leads to a new heuristic that improves K-means in big data conditions. The scalability of the algorithm to big data can be easily adjusted by choosing the appropriate number of subproblems and their size. The proposed algorithm is both scalable and accurate. In our experiments it outperforms all recent state-of-the-art algorithms for the MSSC in terms of time as well as the solution quality.  ( 2 min )
    Towards PAC Multi-Object Detection and Tracking. (arXiv:2204.07482v1 [cs.CV])
    Accurately detecting and tracking multi-objects is important for safety-critical applications such as autonomous navigation. However, it remains challenging to provide guarantees on the performance of state-of-the-art techniques based on deep learning. We consider a strategy known as conformal prediction, which predicts sets of labels instead of a single label; in the classification and regression settings, these algorithms can guarantee that the true label lies within the prediction set with high probability. Building on these ideas, we propose multi-object detection and tracking algorithms that come with probably approximately correct (PAC) guarantees. They do so by constructing both a prediction set around each object detection as well as around the set of edge transitions; given an object, the detection prediction set contains its true bounding box with high probability, and the edge prediction set contains its true transition across frames with high probability. We empirically demonstrate that our method can detect and track objects with PAC guarantees on the COCO and MOT-17 datasets.
    A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow. (arXiv:2110.11991v2 [eess.SY] UPDATED)
    With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
    Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs. (arXiv:2111.06483v3 [cs.LG] UPDATED)
    We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.We made the SAR GNN training library publicy available: \url{https://github.com/IntelLabs/SAR}.
    Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records. (arXiv:2203.06918v2 [cs.CL] UPDATED)
    Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.
    The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants. (arXiv:2204.07431v1 [cs.NE])
    Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances. Existing approaches typically treat the algorithms as black-boxes, without consideration of their characteristics. To investigate in this work if a selection of landscape features that depends on algorithms properties could further improve regression accuracy, we regard the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models. Exploratory data analysis performed on this data indicate that the set of most relevant features does not depend on the configuration of individual modules, but the influence that these features have on regression accuracy does. In addition, we have shown that by using classifiers that take the features relevance on the model accuracy, we are able to predict the status of individual modules in the CMA-ES configurations.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    On the Importance of Firth Bias Reduction in Few-Shot Classification. (arXiv:2110.02529v2 [cs.CV] UPDATED)
    Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction
    Efficient Architecture Search for Diverse Tasks. (arXiv:2204.07554v1 [cs.LG])
    While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
    CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection. (arXiv:2204.07543v1 [cs.LG])
    Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data in a limited time frame. We formulate the cryo-EM data collection task as an optimization problem in this work. The goal is to maximize the total number of good images taken within a specified period. We show that reinforcement learning offers an effective way to plan cryo-EM data collection, successfully navigating heterogenous cryo-EM grids. The approach we developed, cryoRL, demonstrates better performance than average users for data collection under similar settings.
    Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters. (arXiv:2204.07447v1 [cs.CL])
    Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.  ( 2 min )
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Deep learning model solves change point detection for multiple change types. (arXiv:2204.07403v1 [cs.LG])
    A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.  ( 2 min )
    Characterizing metastable states with the help of machine learning. (arXiv:2204.07391v1 [physics.comp-ph])
    Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration. (arXiv:2202.03259v2 [cs.NE] UPDATED)
    It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component of modern evolutionary computation frameworks. Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications. As in classical benchmarking, problem collections with a known ground truth can offer very meaningful insights in this context. Unfortunately, settings with well-understood control policies are very rare. One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
    Simple but Effective: CLIP Embeddings for Embodied AI. (arXiv:2111.09888v2 [cs.CV] UPDATED)
    Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation. We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. We build incredibly simple baselines, named EmbCLIP, with no task specific architectures, inductive biases (such as the use of semantic maps), auxiliary tasks during training, or depth maps -- yet we find that our improved baselines perform very well across a range of tasks and simulators. EmbCLIP tops the RoboTHOR ObjectNav leaderboard by a huge margin of 20 pts (Success Rate). It tops the iTHOR 1-Phase Rearrangement leaderboard, beating the next best submission, which employs Active Neural Mapping, and more than doubling the % Fixed Strict metric (0.08 to 0.17). It also beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge. We evaluate the ability of CLIP's visual representations at capturing semantic information about input observations -- primitives that are useful for navigation-heavy embodied tasks -- and find that CLIP's representations encode these primitives more effectively than ImageNet-pretrained backbones. Finally, we extend one of our baselines, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training. Our code and models are available at https://github.com/allenai/embodied-clip  ( 2 min )
    Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing. (arXiv:2111.11236v3 [cs.RO] UPDATED)
    Although nanorobots have been used as clinical prescriptions for work such as gastroscopy, and even photoacoustic tomography technology has been proposed to control nanorobots to deliver drugs at designated delivery points in real time, and there are cases of eliminating "superbacteria" in blood through nanorobots, most technologies are immature, either with low efficiency or low accuracy, Either it can not be mass produced, so the most effective way to treat cancer diseases at this stage is through chemotherapy and radiotherapy. Patients are suffering and can not be cured. Therefore, this paper proposes an ideal model of a treatment method that can completely cure cancer, a cooperative treatment method based on nano robot queue through team member communication and computer vision image classification (target detection).
    Grassmannian Optimization for Online Tensor Completion and Tracking with the t-SVD. (arXiv:2001.11419v4 [eess.SP] UPDATED)
    We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework. We show the t-SVD is a specialization of the well-studied block-term decomposition for third-order tensors, and we present an algorithm under this model that can track changing free submodules from incomplete streaming 2-D data. The proposed algorithm uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We provide a local expected linear convergence result for our algorithm. Our empirical results are competitive in accuracy but much faster in compute time than state-of-the-art tensor completion algorithms on real applications to recover temporal chemo-sensing and MRI data under limited sampling.
    An interpretable machine learning approach for ferroalloys consumptions. (arXiv:2204.07421v1 [cs.LG])
    This paper is devoted to a practical method for ferroalloys consumption modeling and optimization. We consider the problem of selecting the optimal process control parameters based on the analysis of historical data from sensors. We developed approach, which predicts results of chemical reactions and give ferroalloys consumption recommendation. The main features of our method are easy interpretation and noise resistance. Our approach is based on k-means clustering algorithm, decision trees and linear regression. The main idea of the method is to identify situations where processes go similarly. For this, we propose using a k-means based dataset clustering algorithm and a classification algorithm to determine the cluster. This algorithm can be also applied to various technological processes, in this article, we demonstrate its application in metallurgy. To test the application of the proposed method, we used it to optimize ferroalloys consumption in Basic Oxygen Furnace steelmaking when finishing steel in a ladle furnace. The minimum required element content for a given steel grade was selected as the predictive model's target variable, and the required amount of the element to be added to the melt as the optimized variable. Keywords: Clustering, Machine Learning, Linear Regression, Steelmaking, Optimization, Gradient Boosting, Artificial Intelligence, Decision Trees, Recommendation services
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
    SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning. (arXiv:2203.07029v2 [cs.LG] UPDATED)
    We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.
    Safe Reinforcement Learning Using Black-Box Reachability Analysis. (arXiv:2204.07417v1 [cs.RO])
    Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, and a trajectory-tracking point mass with an unsafe set adjacent to the area of highest reward.
    Model-Based Deep Learning of Joint Probabilistic and Geometric Shaping for Optical Communication. (arXiv:2204.07457v1 [eess.SP])
    Autoencoder-based deep learning is applied to jointly optimize geometric and probabilistic constellation shaping for optical coherent communication. The optimized constellation shaping outperforms the 256 QAM Maxwell-Boltzmann probabilistic distribution with extra 0.05 bits/4D-symbol mutual information for 64 GBd transmission over 170 km SMF link.
    A Machine Learning Tutorial for Operational Meteorology, Part I: Traditional Machine Learning. (arXiv:2204.07492v1 [physics.ao-ph])
    Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
    Super Resolution for Turbulent Flows in 2D: Stabilized Physics Informed Neural Networks. (arXiv:2204.07413v1 [math.NA])
    We propose a new design of a neural network for solving a zero shot super resolution problem for turbulent flows. We embed Luenberger-type observer into the network's architecture to inform the network of the physics of the process, and to provide error correction and stabilization mechanisms. In addition, to compensate for decrease of observer's performance due to the presence of unknown destabilizing forcing, the network is designed to estimate the contribution of the unknown forcing implicitly from the data over the course of training. By running a set of numerical experiments, we demonstrate that the proposed network does recover unknown forcing from data and is capable of predicting turbulent flows in high resolution from low resolution noisy observations.  ( 2 min )
    Invariance Through Inference. (arXiv:2112.08526v2 [cs.LG] UPDATED)
    We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in scene content, camera poses, and lighting conditions. We present results on challenging domains including distractor control suite and sim-to-real transfer for image-based robot manipulation.
    Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. (arXiv:2204.05255v2 [cs.CR] UPDATED)
    Backdoor attacks insert malicious data into a training set so that, during inference time, it misclassifies inputs that have been patched with a backdoor trigger as the malware specified label. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." Existing clean-label backdoor attacks require knowledge of the entire training set to be effective. Obtaining such knowledge is difficult or impossible because training data are often gathered from multiple sources (e.g., face images from different users). It remains a question whether backdoor attacks still present a real threat. This paper provides an affirmative answer to this question by designing an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. With poisoning equal to or less than 0.5% of the target-class data and 0.05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger. Our attack works well across datasets and models, even when the trigger presents in the physical world. We explore the space of defenses and find that, surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.
    Experimentally realized memristive memory augmented neural network. (arXiv:2204.07429v1 [cs.ET])
    Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.
    Rethinking Machine Learning Model Evaluation in Pathology. (arXiv:2204.05205v2 [eess.IV] UPDATED)
    Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.
    Synthesizing Informative Training Samples with GAN. (arXiv:2204.07513v1 [cs.LG])
    Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial neural networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks. In this paper, we propose a novel method to synthesize Informative Training samples with GAN (IT-GAN). Specifically, we freeze a pre-trained GAN model and learn the informative latent vectors that corresponds to informative training samples. The synthesized images are required to preserve information for training deep neural networks rather than visual reality or fidelity. Experiments verify that the deep neural networks can learn faster and achieve better performance when being trained with our IT-GAN generated images. We also show that our method is a promising solution to dataset condensation problem.
    Streaming Align-Refine for Non-autoregressive Deliberation. (arXiv:2204.07556v1 [cs.CL])
    We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model. Our algorithm facilitates a simple greedy decoding procedure, and at the same time is capable of producing the decoding result at each frame with limited right context, thus enjoying both high efficiency and low latency. These advantages are achieved by converting the offline Align-Refine algorithm to be streaming-compatible, with a novel transformer decoder architecture that performs local self-attentions for both text and audio, and a time-aligned cross-attention at each layer. Furthermore, we perform discriminative training of our model with the minimum word error rate (MWER) criterion, which has not been done in the non-AR decoding literature. Experiments on voice search datasets and Librispeech show that with reasonable right context, our streaming model performs as well as the offline counterpart, and discriminative training leads to further WER gain when the first-pass model has small capacity.
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v1 [cs.CV])
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks. BNNs, on the other hand, suffer from information loss because binary activations are limited to only two values, resulting in reduced accuracy. To improve the accuracy, previous studies have attempted to control the distribution of binary activation by manually shifting the threshold of the activation function or making the shift amount trainable. During the process, they usually depended on statistical information computed from a batch. We argue that using statistical data from a batch fails to capture the crucial information for each input instance in BNN computations, and the differences between statistical information computed from each instance need to be considered when determining the binary activation threshold of each instance. Based on the concept, we propose the Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which decides the activation threshold value considering the difference between statistical data computed from a batch and each instance. The proposed INSTA-BNN outperforms the baseline by 2.5% and 2.3% on the ImageNet classification task with comparable computing cost, achieving 68.0% and 71.7% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Neural Structured Prediction for Inductive Node Classification. (arXiv:2204.07524v1 [cs.LG])
    This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
    Selecting Continuous Life-Like Cellular Automata for Halting Unpredictability: Evolving for Abiogenesis. (arXiv:2204.07541v1 [cs.NE])
    Substantial efforts have been applied to engineer CA with desired emergent properties, such as supporting gliders. Recent work in continuous CA has generated a wide variety of compelling bioreminescent patterns, and the expansion of CA research into continuous numbers, multiple channels, and higher dimensions complicates their study. In this work we devise a strategy for evolving CA and CA patterns in two steps, based on the simple idea that CA are likely to be complex and computationally capable if they support patterns that grow indefinitely as well as patterns that vanish completely, and are difficult to predict the difference in advance. The second part of our strategy evolves patterns by selecting for mobility and conservation of mean cell value. We validate our pattern evolution method by re-discovering gliders in 17 of 17 Lenia CA, and also report 5 new evolved CA that support evolved glider patterns, differing from previously reported Lenia patterns. The CA reported here share neighborhood kernels with previously described Lenia CA, but exhibit a wider range of typical dynamics than their Lenia counterparts. Code for evolving continuous CA is made available under an MIT License.
    Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection. (arXiv:2204.07569v1 [cs.IT])
    Faster-than-Nyquist (FTN) signaling is a candidate non-orthonormal transmission technique to improve the spectral efficiency (SE) of future communication systems. However, such improvements of the SE are at the cost of additional computational complexity to remove the intentionally introduced intersymbol interference. In this paper, we investigate the use of deep learning (DL) to reduce the detection complexity of FTN signaling. To eliminate the need of having a noise whitening filter at the receiver, we first present an equivalent FTN signaling model based on using a set of orthonormal basis functions and identify its operation region. Second, we propose a DL-based list sphere decoding (DL-LSD) algorithm that selects and updates the initial radius of the original LSD to guarantee a pre-defined number $N_{\text{L}}$ of lattice points inside the hypersphere. This is achieved by training a neural network to output an approximate initial radius that includes $N_{\text{L}}$ lattice points. At the testing phase, if the hypersphere has more than $N_{\text{L}}$ lattice points, we keep the $N_{\text{L}}$ closest points to the point corresponding to the received FTN signal; however, if the hypersphere has less than $N_{\text{L}}$ points, we increase the approximate initial radius by a value that depends on the standard deviation of the distribution of the output radii from the training phase. Then, the approximate value of the log-likelihood ratio (LLR) is calculated based on the obtained $N_{\text{L}}$ points. Simulation results show that the computational complexity of the proposed DL-LSD is lower than its counterpart of the original LSD by orders of magnitude.
    Accurate ADMET Prediction with XGBoost. (arXiv:2204.07532v1 [q-bio.BM])
    The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. Here, we apply an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 10 tasks and top 3 in 18 tasks.  ( 2 min )
    Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. (arXiv:2111.00009v2 [eess.AS] UPDATED)
    In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.  ( 2 min )
    Barwise Compression Schemes for Audio-Based Music Structure Analysis. (arXiv:2202.04981v2 [cs.SD] UPDATED)
    Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.  ( 2 min )
    GitTables: A Large-Scale Corpus of Relational Tables. (arXiv:2106.07258v4 [cs.DB] UPDATED)
    The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io.  ( 2 min )
    NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming. (arXiv:2109.12171v3 [cs.LG] UPDATED)
    Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.  ( 2 min )
    Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. (arXiv:2202.03666v2 [cs.LG] UPDATED)
    Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.
    Sparsifying the Update Step in Graph Neural Networks. (arXiv:2109.00909v3 [cs.LG] UPDATED)
    Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, celebrate much success in the analysis of graph-structured data. Concurrently, the sparsification of Neural Network models attracts a great amount of academic and industrial interest. In this paper we conduct a structured, empirical study of the effect of sparsification on the trainable part of MPNNs known as the Update step. To this end, we design a series of models to successively sparsify the linear transform in the Update step. Specifically, we propose the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step. In agreement with a growing trend in the literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. Our novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution. The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several downstream tasks, while containing significantly fewer trainable parameters. Our code is publicly available at: https://github.com/ChangminWu/ExpanderGNN.
    Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version. (arXiv:2203.16110v3 [cs.LG] UPDATED)
    In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v2 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces the following challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper introduces a new technique called model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation on the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.  ( 2 min )
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    The Distributed Information Bottleneck reveals the explanatory structure of complex systems. (arXiv:2204.07576v1 [cs.LG])
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science. The Distributed Information Bottleneck throttles the downstream complexity of interactions between the components of the input, deconstructing a relationship into meaningful approximations found through deep learning without requiring custom-made datasets or neural network architectures. Applied to a complex system, the approximations illuminate aspects of the system's nature by restricting -- and monitoring -- the information about different components incorporated into the approximation. We demonstrate the Distributed IB's explanatory utility in systems drawn from applied mathematics and condensed matter physics. In the former, we deconstruct a Boolean circuit into approximations that isolate the most informative subsets of input components without requiring exhaustive search. In the latter, we localize information about future plastic rearrangement in the static structure of a sheared glass, and find the information to be more or less diffuse depending on the system's preparation. By way of a principled scheme of approximations, the Distributed IB brings much-needed interpretability to deep learning and enables unprecedented analysis of information flow through a system.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Transferability Properties of Graph Neural Networks. (arXiv:2112.04629v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are composed of layers consisting of graph convolutions and pointwise nonlinearities. Due to their invariance and stability properties, GNNs are provably successful at learning representations from data supported on moderate-scale graphs. However, they are difficult to learn on large-scale graphs. In this paper, we study the problem of training GNNs on graphs of moderate size and transferring them to large-scale graphs. We use graph limits called graphons to define limit objects for graph filters and GNNs -- graphon filters and graphon neural networks (WNNs) -- which we interpret as generative models for graph filters and GNNs. We then show that graphon filters and WNNs can be approximated by graph filters and GNNs sampled from them on weighted and stochastic graphs. Because the error of these approximations can be upper bounded, by a triangle inequality argument we can further bound the error of transferring a graph filter or a GNN across graphs. Our results show that (i) the transference error decreases with the graph size, and (ii) that graph filters have a transferability-discriminability tradeoff that in GNNs is alleviated by the scattering behavior of the nonlinearity. These findings are demonstrated empirically in a movie recommendation problem and in a decentralized control task.
    Kernel similarity matching with Hebbian neural networks. (arXiv:2204.07475v1 [cs.NE])
    Recent works have derived neural networks with online correlation-based learning rules to perform \textit{kernel similarity matching}. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological "Nystr{\"o}m" method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nystr{\"o}m methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.
    Prototype-based Domain Generalization Framework for Subject-Independent Brain-Computer Interfaces. (arXiv:2204.07358v1 [eess.SP])
    Brain-computer interface (BCI) is challenging to use in practice due to the inter/intra-subject variability of electroencephalography (EEG). The BCI system, in general, necessitates a calibration technique to obtain subject/session-specific data in order to tune the model each time the system is utilized. This issue is acknowledged as a key hindrance to BCI, and a new strategy based on domain generalization has recently evolved to address it. In light of this, we've concentrated on developing an EEG classification framework that can be applied directly to data from unknown domains (i.e. subjects), using only data acquired from separate subjects previously. For this purpose, in this paper, we proposed a framework that employs the open-set recognition technique as an auxiliary task to learn subject-specific style features from the source dataset while helping the shared feature extractor with mapping the features of the unseen target dataset as a new unseen domain. Our aim is to impose cross-instance style in-variance in the same domain and reduce the open space risk on the potential unseen subject in order to improve the generalization ability of the shared feature extractor. Our experiments showed that using the domain information as an auxiliary network increases the generalization performance.  ( 2 min )
    End-to-End Sensitivity-Based Filter Pruning. (arXiv:2204.07412v1 [cs.CV])
    In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.  ( 2 min )
    Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. (arXiv:2204.07390v1 [cs.CL])
    Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.  ( 2 min )
    Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection. (arXiv:2204.07372v1 [cs.CL])
    Current works in the generation of personalized dialogue primarily contribute to the agent avoiding contradictory persona and driving the response more informative. However, we found that the generated responses from these models are mostly self-centered with little care for the other party since they ignore the user's persona. Moreover, we consider high-quality transmission is essentially built based on apprehending the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting implicit user persona. Because it's difficult to collect a large number of personas for each user, we attempt to model the user's potential persona and its representation from the dialogue absence of any external information. Perception variable and fader variable are conceived utilizing Conditional Variational Inference. The two latent variables simulate the process of people being aware of the other party's persona and producing the corresponding expression in conversation. Finally, Posterior-discriminated Regularization is presented to enhance the training procedure. Empirical studies demonstrate that compared with the state-of-the-art methods, ours is more concerned with the user's persona and outperforms in evaluations.  ( 2 min )
    Anomalous Sound Detection Based on Machine Activity Detection. (arXiv:2204.07353v1 [eess.AS])
    We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.  ( 2 min )
    SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing. (arXiv:2204.07406v1 [cs.CV])
    Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.  ( 2 min )
    Crowd counting with segmentation attention convolutional neural network. (arXiv:2204.07380v1 [cs.CV])
    Deep learning occupies an undisputed dominance in crowd counting. In this paper, we propose a novel convolutional neural network (CNN) architecture called SegCrowdNet. Despite the complex background in crowd scenes, the proposeSegCrowdNet still adaptively highlights the human head region and suppresses the non-head region by segmentation. With the guidance of an attention mechanism, the proposed SegCrowdNet pays more attention to the human head region and automatically encodes the highly refined density map. The crowd count can be obtained by integrating the density map. To adapt the variation of crowd counts, SegCrowdNet intelligently classifies the crowd count of each image into several groups. In addition, the multi-scale features are learned and extracted in the proposed SegCrowdNet to overcome the scale variations of the crowd. To verify the effectiveness of our proposed method, extensive experiments are conducted on four challenging datasets. The results demonstrate that our proposed SegCrowdNet achieves excellent performance compared with the state-of-the-art methods.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network. (arXiv:2204.07360v1 [eess.SP])
    This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region.  ( 2 min )
    Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts. (arXiv:2204.07312v1 [math.OC])
    The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.  ( 2 min )
    Knowledgebra: An Algebraic Learning Framework for Knowledge Graph. (arXiv:2204.07328v1 [cs.LG])
    Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.  ( 2 min )
    XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding. (arXiv:2204.07316v1 [cs.CL])
    Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.  ( 2 min )
    Ensemble diverse hypotheses and knowledge distillation for unsupervised cross-subject adaptation. (arXiv:2204.07308v1 [cs.RO])
    Recognizing human locomotion intent and activities is important for controlling the wearable robots while walking in complex environments. However, human-robot interface signals are usually user-dependent, which causes that the classifier trained on source subjects performs poorly on new subjects. To address this issue, this paper designs the ensemble diverse hypotheses and knowledge distillation (EDHKD) method to realize unsupervised cross-subject adaptation. EDH mitigates the divergence between labeled data of source subjects and unlabeled data of target subjects to accurately classify the locomotion modes of target subjects without labeling data. Compared to previous domain adaptation methods based on the single learner, which may only learn a subset of features from input signals, EDH can learn diverse features by incorporating multiple diverse feature generators and thus increases the accuracy and decreases the variance of classifying target data, but it sacrifices the efficiency. To solve this problem, EDHKD (student) distills the knowledge from the EDH (teacher) to a single network to remain efficient and accurate. The performance of the EDHKD is theoretically proved and experimentally validated on a 2D moon dataset and two public human locomotion datasets. Experimental results show that the EDHKD outperforms all other methods. The EDHKD can classify target data with 96.9%, 94.4%, and 97.4% average accuracy on the above three datasets with a short computing time (1 ms). Compared to a benchmark (BM) method, the EDHKD increases 1.3% and 7.1% average accuracy for classifying the locomotion modes of target subjects. The EDHKD also stabilizes the learning curves. Therefore, the EDHKD is significant for increasing the generalization ability and efficiency of the human intent prediction and human activity recognition system, which will improve human-robot interactions.  ( 2 min )
    Methodical Advice Collection and Reuse in Deep Reinforcement Learning. (arXiv:2204.07254v1 [cs.LG])
    Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.  ( 2 min )
    A Differentially Private Probabilistic Framework for Modeling the Variability Across Federated Datasets of Heterogeneous Multi-View Observations. (arXiv:2204.07352v1 [cs.LG])
    We propose a novel federated learning paradigm to model data variability among heterogeneous clients in multi-centric studies. Our method is expressed through a hierarchical Bayesian latent variable model, where client-specific parameters are assumed to be realization from a global distribution at the master level, which is in turn estimated to account for data bias and variability across clients. We show that our framework can be effectively optimized through expectation maximization (EM) over latent master's distribution and clients' parameters. We also introduce formal differential privacy (DP) guarantees compatibly with our EM optimization scheme. We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease. We demonstrate that our method is robust when data is distributed either in iid and non-iid manners, even when local parameters perturbation is included to provide DP guarantees. Moreover, the variability of data, views and centers can be quantified in an interpretable manner, while guaranteeing high-quality data reconstruction as compared to state-of-the-art autoencoding models and federated learning schemes. The code is available at https://gitlab.inria.fr/epione/federated-multi-views-ppca.  ( 2 min )
    Crowd counting with crowd attention convolutional neural network. (arXiv:2204.07347v1 [cs.CV])
    Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually regard some objects as people mistakenly; causing potentially enormous errors in the crowd counting result. To address the problem, we propose a novel end-to-end model called Crowd Attention Convolutional Neural Network (CAT-CNN). Our CAT-CNN can adaptively assess the importance of a human head at each pixel location by automatically encoding a confidence map. With the guidance of the confidence map, the position of human head in estimated density map gets more attention to encode the final density map, which can avoid enormous misjudgements effectively. The crowd count can be obtained by integrating the final density map. To encode a highly refined density map, the total crowd count of each image is classified in a designed classification task and we first explicitly map the prior of the population-level category to feature maps. To verify the efficiency of our proposed method, extensive experiments are conducted on three highly challenging datasets. Results establish the superiority of our method over many state-of-the-art methods.  ( 2 min )
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v1 [cs.LG])
    Graph neural networks have emerged as a leading architecture for many graph-level tasks such as graph classification and graph generation with a notable improvement. Among these tasks, graph pooling is an essential component of graph neural network architectures for obtaining a holistic graph-level representation of the entire graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these methods. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods on graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods and provide a mathematical summary for each category; 2) next, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) then, we further outline in brief the applications that incorporate the idea of graph pooling in a number of domains; 4) and finally, we discuss some critical challenges faced by the current studies and share our insights on potential directions for improving graph pooling in the future.  ( 2 min )
    Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning. (arXiv:2204.07373v1 [cs.RO])
    Adversarial training (i.e., training on adversarially perturbed input data) is a well-studied method for making neural networks robust to potential adversarial attacks during inference. However, the improved robustness does not come for free but rather is accompanied by a decrease in overall model accuracy and performance. Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off but inflict a net loss when measured in holistic robot performance. This work revisits the robustness-accuracy trade-off in robot learning by systematically analyzing if recent advances in robust training methods and theory in conjunction with adversarial robot learning can make adversarial training suitable for real-world robot applications. We evaluate a wide variety of robot learning tasks ranging from autonomous driving in a high-fidelity environment amenable to sim-to-real deployment, to mobile robot gesture recognition. Our results demonstrate that, while these techniques make incremental improvements on the trade-off on a relative scale, the negative side-effects caused by adversarial training still outweigh the improvements by an order of magnitude. We conclude that more substantial advances in robust learning methods are necessary before they can benefit robot learning tasks in practice.  ( 2 min )
    Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. (arXiv:2204.07305v1 [cs.CV])
    Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf.  ( 2 min )
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.  ( 2 min )
    Unsupervised Probabilistic Models for Sequential Electronic Health Records. (arXiv:2204.07292v1 [cs.LG])
    We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models. (arXiv:2204.07288v1 [cs.CL])
    With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy costs than Big Bird. For summarization, we find that increasing model size is more energy efficient than increasing sequence length for higher accuracy. However, this comes at the cost of a large drop in inference speed. For question answering, we find that smaller models are both more efficient and more accurate due to the larger training batch sizes possible under a fixed resource budget.  ( 2 min )
    Active Learning for Regression and Classification by Inverse Distance Weighting. (arXiv:2204.07177v1 [cs.LG])
    This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{this http URL}.  ( 2 min )
    Hierarchical Embedded Bayesian Additive Regression Trees. (arXiv:2204.07207v1 [stat.ME])
    We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to specify the structure of the random effects in the model, whilst maintaining the prediction and uncertainty calibration properties of standard BART. Using simulated and real-world examples, we demonstrate that this new extension yields superior predictions for many of the standard mixed effects models' example data sets, and yet still provides consistent estimates of the random effect variances. In a future version of this paper, we outline its use in larger, more advanced data sets and structures.  ( 2 min )
    Spatio-Temporal Analysis of Transformer based Architecture for Attention Estimation from EEG. (arXiv:2204.07162v1 [q-bio.NC])
    For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a novel framework allowing us to retrieve the attention state, i.e degree of attention given to a specific task, from EEG signals. While previous methods often consider the spatial relationship in EEG through electrodes and process them in recurrent or convolutional based architecture, we propose here to also exploit the spatial and temporal information with a transformer-based network that has already shown its supremacy in many machine-learning (ML) related studies, e.g. machine translation. In addition to this novel architecture, an extensive study on the feature extraction methods, frequential bands and temporal windows length has also been carried out. The proposed network has been trained and validated on two public datasets and achieves higher results compared to state-of-the-art models. As well as proposing better results, the framework could be used in real applications, e.g. Attention Deficit Hyperactivity Disorder (ADHD) symptoms or vigilance during a driving assessment.  ( 2 min )
    Testing distributional assumptions of learning algorithms. (arXiv:2204.07196v1 [cs.LG])
    There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.  ( 2 min )
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.  ( 2 min )
    Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification. (arXiv:2204.07246v1 [cs.RO])
    This study explores how robots and generative approaches can be used to mount successful false-acceptance adversarial attacks on signature verification systems. Initially, a convolutional neural network topology and data augmentation strategy are explored and tuned, producing an 87.12% accurate model for the verification of 2,640 human signatures. Two robots are then tasked with forging 50 signatures, where 25 are used for the verification attack, and the remaining 25 are used for tuning of the model to defend against them. Adversarial attacks on the system show that there exists an information security risk; the Line-us robotic arm can fool the system 24% of the time and the iDraw 2.0 robot 32% of the time. A conditional GAN finds similar success, with around 30% forged signatures misclassified as genuine. Following fine-tune transfer learning of robotic and generative data, adversarial attacks are reduced below the model threshold by both robots and the GAN. It is observed that tuning the model reduces the risk of attack by robots to 8% and 12%, and that conditional generative adversarial attacks can be reduced to 4% when 25 images are presented and 5% when 1000 images are presented.  ( 2 min )
    The training response law explains how deep neural networks learn. (arXiv:2204.07291v1 [cond-mat.dis-nn])
    Deep neural network is the widely applied technology in this decade. In spite of the fruitful applications, the mechanism behind that is still to be elucidated. We study the learning process with a very simple supervised learning encoding problem. As a result, we found a simple law, in the training response, which describes neural tangent kernel. The response consists of a power law like decay multiplied by a simple response kernel. We can construct a simple mean-field dynamical model with the law, which explains how the network learns. In the learning, the input space is split into sub-spaces along competition between the kernels. With the iterated splits and the aging, the network gets more complexity, but finally loses its plasticity.  ( 2 min )
    Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks. (arXiv:2204.07261v1 [cs.LG])
    We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.  ( 2 min )
    Learning two-phase microstructure evolution using neural operators and autoencoder architectures. (arXiv:2204.07230v1 [cond-mat.mtrl-sci])
    Phase-field modeling is an effective mesoscale method for capturing the evolution dynamics of materials, e.g., in spinodal decomposition of a two-phase mixture. However, the accuracy of high-fidelity phase field models comes at a substantial computational cost. Hence, fast and generalizable surrogate models are needed to alleviate the cost in computationally taxing processes such as in optimization and design of materials. The intrinsic discontinuous nature of the physical phenomena incurred by the presence of sharp phase boundaries makes the training of the surrogate model cumbersome. We develop a new framework that integrates a convolutional autoencoder architecture with a deep neural operator (DeepONet) to learn the dynamic evolution of a two-phase mixture. We utilize the convolutional autoencoder to provide a compact representation of the microstructure data in a low-dimensional latent space. DeepONet, which consists of two sub-networks, one for encoding the input function at a fixed number of sensors locations (branch net) and another for encoding the locations for the output functions (trunk net), learns the mesoscale dynamics of the microstructure evolution in the latent space. The decoder part of the convolutional autoencoder can then reconstruct the time-evolved microstructure from the DeepONet predictions. The result is an efficient and accurate accelerated phase-field framework that outperforms other neural-network-based approaches while at the same time being robust to noisy inputs.  ( 2 min )
    Minimizing Control for Credit Assignment with Strong Feedback. (arXiv:2204.07249v1 [cs.NE])
    The success of deep learning attracted interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.  ( 2 min )
    Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers. (arXiv:2204.07182v1 [cs.AI])
    Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.  ( 2 min )
    Relaxing Equivariance Constraints with Non-stationary Continuous Filters. (arXiv:2204.07178v1 [cs.LG])
    Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although some tasks are inherently equivariant, many tasks do not strictly follow such symmetries. In such cases, equivariance constraints can be overly restrictive. In this work, we propose a parameter-efficient relaxation of equivariance that can effectively interpolate between a (i) non-equivariant linear product, (ii) a strict-equivariant convolution, and (iii) a strictly-invariant mapping. The proposed parameterization can be thought of as a building block to allow adjustable symmetry structure in neural networks. Compared to non-equivariant or strict-equivariant baselines, we experimentally verify that soft equivariance leads to improved performance in terms of test accuracy on CIFAR-10 and CIFAR-100 image classification tasks.  ( 2 min )
    Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition. (arXiv:2204.07208v1 [cs.LG])
    CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.  ( 2 min )
    Harnessing Interpretable Machine Learning for Origami Feature Design and Pattern Selection. (arXiv:2204.07235v1 [cond-mat.soft])
    Engineering design of origami systems is challenging because comparing different origami patterns requires using categorical features and evaluating multi-physics behavior targets introduces multi-objective problems. This work shows that a decision tree machine learning method is particularly suitable for the inverse design of origami. This interpretable machine learning method can reveal complex interactions between categorical features and continuous features for comparing different origami patterns, can tackle multi-objective problems for designing active origami with multi-physics performance targets, and can extend existing origami shape fitting algorithms to further consider non-geometrical performances of origami systems. The proposed framework shows a holistic way of designing active origami systems for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.  ( 2 min )
    Physics-Aware Recurrent Convolutional (PARC) Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials. (arXiv:2204.07234v1 [cond-mat.mtrl-sci])
    The thermomechanical properties of energetic materials (EM) are known to be a function of their microscopic structures, i.e., morphological configurations of crystals and pores. This microstructural dependency has motivated vigorous research in the EM community, seeking to engineer material microstructures with targeted properties and performance under the materials-by-design paradigm. However, establishing the complex structure-property-performance (SPP) relationships of EMs demands extensive experimental and simulation efforts, and assimilating and encapsulating these relationships in usable models is a challenge. Here, we present a novel deep learning method, Physics-Aware Recurrent Convolutional (PARC) Neural Network, that can "learn" the mesoscale thermo-mechanics of EM microstructures during the shock-to-detonation transition (SDT). We show that this new approach can produce accurate high-fidelity predictions of time-evolving temperature and pressure fields of the same quality as the state-of-the-art direct numerical simulations (DNS), despite the dramatic reduction of computing time, from hours and days on a high-performance computing cluster (HPC) to a little more than a second on a commodity laptop. We also demonstrate that PARC can provide physical insights, i.e., the artificial neurons can illuminate the underlying physics by identifying which microstructural features led to critical hotspots and what are the characteristics of "critical" versus "non-critical" microstructures. This new knowledge generated alongside the capacity to conduct high-throughput experiments will broaden our theoretical understanding of the initiation mechanisms of EM detonation, as a step towards engineering EMs with specific properties.  ( 2 min )
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.  ( 2 min )
  • Open

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 2 min )
    A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation. (arXiv:2204.00036v2 [stat.ME] UPDATED)
    One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.  ( 2 min )
    Bayesian Nonparametrics for Sparse Dynamic Networks. (arXiv:1607.01624v2 [stat.ML] UPDATED)
    In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Warped Dynamic Linear Models for Time Series of Counts. (arXiv:2110.14790v2 [stat.ME] UPDATED)
    Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.  ( 2 min )
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.  ( 2 min )
    Distributed Reconstruction of Noisy Pooled Data. (arXiv:2204.07491v1 [cs.IT])
    In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.  ( 2 min )

  • Open

    [R][P] Mask Transfiner for High-Quality Instance Segmentation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [P] Spoonfy: Turn any foreign-language video into effective listening practice
    Video (Despacito, slightly NSFW): https://drive.google.com/file/d/12qYKv_yaqGr9GWvHPtE9ng2foJVPfpoE/view?usp=sharing Code & more info: https://github.com/athairus/SpoonfyDemo Discord: https://discord.gg/7wcZZzeSQk Spoonfy is essentially the so-called Telenovela method (learning languages through subtitled video) on steroids: This demo uses a finetuned Facebook's M2M100 model to translate Spanish to English (finetuned to do literal translation instead of ordinary translation) and a wav2vec2 model to get Spanish word timings to present the literal (aka word-by-word) translations as karaoke-style lyrics. What sets Spoonfy apart from other solutions is the way it leverages the massive body of existing subtitled content out there to create learning material. Also because it's FOSS. More details in the code's README. I've been working on this project by myself for a few months now, I hope you see potential behind it like I do! If so (but also if not), I'd love to hear what you think. And I'd love to get your help improving on what I already built. I have plenty of ideas for how to make the translations even more accurate, the system more robust & able to handle more sources of content (YouTube, TikTok, Blu-Rays), etc. Thanks for checking it out! submitted by /u/athairus [link] [comments]  ( 1 min )
    [D] Current work on knowledge representation with your preference, and use of language models
    In robotics and autonomous systems, knowledge representation is an important aspect. What is your favorite methods for knowledge representation, is it Formal logic or graphs or whatever and why you like that kind of representation. Considering the success of large language model isn't it a good time to use them in new kind of representation, so that robots or similar system can make better decisions in an environment. I still feel there is no common consensus in community for correct way of knowledge representation, correct me If I am wrong. submitted by /u/projekt_treadstone [link] [comments]  ( 1 min )
    [D] DALL-E 2 vs Disco Diffusion - SHOWDOWN!
    submitted by /u/nin_artificial [link] [comments]  ( 1 min )
    WACV vs. BMVC [R]
    How do they compare in terms of the communities, prestige, competitiveness, and impact. I have a paper accepted to a CVPR workshop and considering extending it and submitting to one of these. The work is based on explainability in medical vision. It's more methods-oriented rather than large-scale experiments. What are your suggestions? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
    submitted by /u/davidbun [link] [comments]  ( 5 min )
    [D] Is it ok to promise a dataset in your paper, get published and then not release it?
    Recently, I decided to explore NeRF and found a very interesting dataset in the NeRS paper of 3D models, which was published in NeurIPS 2021 four months ago. Authors promised to release their dataset: The filtered dataset with anonymized personally identifiable information (e.g. license plates and phone numbers), masks, initial camera poses, and optimized NeRS cameras will be made available on the project page. However, if you check their project page or github repo — there is nothing there. I do not have much experience in machine learning, but wonder whether it's ok to do this? My thinking was that it is something to look down upon, but in this case it is done by Carnegie Mellon University (which is a top-tier one in ML?) on a top-tier conference (NeurIPS 2021). So I assume it's fine? submitted by /u/throwmeaway-account [link] [comments]  ( 5 min )
    [D] Wasserstein distance lipschitz vs gaussian distribution
    Hi, I heard there are different ways to calculate Wasserstein distance in Neural network context. First, We can convert 1-d Wasserstein loss to dual representation and constraint it's size(to makes lipschitz function). We need to do weight clip to make our model a lipschitz function. Second, we can make neural network output as Gaussian distribution and calculate easy form using neural network output as mean and covariance matrix. So, what are the advantages and disadvantages of comparing them? It may sound ambiguous, but I have not seen a study that compares the two about representation quality, computation, etc... Thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] What do you use to make your blog/personal websites?
    I've noticed a lot of folks in ML have a personal website that doubles as a blog to write about their work/projects. As someone looking to build their own website along the same lines, I'm looking for frameworks to try and build it with. What framework do you use to design your site? submitted by /u/SwiftLynx [link] [comments]  ( 3 min )
    [Discussion] Interpretable Neural Network ... ?
    Hi All! I've been working on a linear method that extracts signals from images by learning a set of composable image filters. It can recompose an image using these filters as seen on this biological histology tissue (real on the right, recomposed on the left) ​ ​ https://preview.redd.it/czgdk6edx0u81.png?width=768&format=png&auto=webp&s=8768f93a749fff7dc41e576a74403096e113942e ​ Because it is a linear method that learns image filters - I had an idea: what if some components of a neural network could be replaced with a learnable set of filters? For those not in the know, image filters are similar to masks that upweight some parts of the image, and downweights other parts - similar to a highlighter to select text and a pen to cross out words. I show how in the figure below: ​ ​ https://preview.redd.it/2azco14ex0u81.jpg?width=499&format=pjpg&auto=webp&s=f3795fec06daa13da61ec155159a0ad865524530 ​ Learning a set of image filters with a neural network is a good idea, as neural networks are much more flexible and are considered to be "universal function approximations". So I wrote up a Pytorch package to pass the neural network feature weights from Convolutions and Max Pooling into the linear method to learn a relevant set of filters - results are comparable even on CIFAR10. The caveat is that there is no ReLU, no other activation functions, and no Dropout - only 1 main single linear layer that learns filters... an interpretable neural network! ​ Results are all here (including ipynb comparing with base CNN and VGG16) https://github.com/AskExplain/Interpretable-Neural-Net ​ I'll update the GitHub with some figures of why the single layer is interpretable soon ... ​ In the meantime - discuss! submitted by /u/TryToExplainHow [link] [comments]  ( 3 min )
    [P] New Graph Data Augmentation Library
    Hello! I recently built grafog, a graph data augmentation library on top of PyTorch Geometric. You can chain together graph mentations as done in albumentations or torchvision.transforms. Check it out: https://github.com/rish-16/grafog It has the following augmentations: Random Node Drop Random Edge Drop Normalize Features MixUp Strategy Node Feature Masking Edge Feature Masking https://preview.redd.it/c53r7gkrk0u81.png?width=689&format=png&auto=webp&s=8fbe668e82571a7fe5de9ebb5e4690dbd34032bb https://preview.redd.it/5zrj4gkrk0u81.png?width=863&format=png&auto=webp&s=5bd02ea4adaf86b8911fa89372be9f05f9010536 Happy augmenting! submitted by /u/rish-16 [link] [comments]
    [N]: How does OpenAI's DALL-E 2 work?
    submitted by /u/giugiacaglia [link] [comments]
    [D] What is the opposite of an ablative study?
    I've the feeling that this question may be really stupid but I make it anyway. In ML we often see ablative studies. How is the opposite of it called? In other words: A study that aim to improve a model, and once an improvement is reached, this new model is taken as basis for further investigations? submitted by /u/Rogitus [link] [comments]  ( 2 min )
  • Open

    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    AI Trippy Dream 19 - Exploring a Colorful Maze
    submitted by /u/LordPewPew777 [link] [comments]
    a better Boids simulation: An artificial life simulation of the flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    New AI upscaler tool
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    credit scoring for companies
    Hello everyone I'm newbie so pardon me if you find that my question is stupid. I'm working on a project here's it's description in a nutshell ( Classifying companies if they're going to bankrupt or not and based of the probability of default ( probability of bankruptcy) give each companies a score For example 88 percent to bankrupt score is D 21 percent to bankrupt score is B 3 percent to bankrupt score is a) My question is what kind of models should test ? Should i go for machine learning algorithms such as logistic regression, knn, SVM? Should I go for neural networks ANN? Or can I use deep learning models like MLP... probabilistic Neural Network? Any guidance or advice will be appreciated and thanks a lot. submitted by /u/YeccAnon4 [link] [comments]  ( 1 min )
  • Open

    LSTM for time series prediction
    Hi I am doing a project where I have to predict sales for a company and I am having some trouble with my LSTM model in python. All research I have done tells me that LSTM is as good, if not better, than a ARIMA model for forecasting on time series data, but my LSTM is significantly worse than my ARIMA model. Would it be possible for any one to help me to see if I have implemented it right? I have used both Tensorflow and Pytorch and both are way worse than the ARIMA model. submitted by /u/magnussendjoko [link] [comments]  ( 2 min )
  • Open

    Learning style of play (different agents' actions) in the same offline RL environment?
    Hi, everyone. I'm a relative novice in RL, so bear with me as I try to formulate my question. I'm working on a chess bot that can play moves like a player (imitate their style of play) that is chosen from a set of players (that the bot is trained on) , if I give the bot the previous x moves. Using more technical terms, I'm trying to create an agent that is given a sequence of states-actions of another agent (player) and some representation of who that agent (player) is, and predict the next action (continue playing in the style of that player). I'm fairly certain this is an RL problem, as I don't know how to frame it as a supervised learning problem (I might be wrong). I've seen some papers that abstract offline RL as a sequence modeling problem (Decision Transformer, Trajectory Transformer), so I'm fairly certain I should continue in a similar manner. But I'm having a hard time trying to understand how to treat the difference in players. My instinct was to use some representation of the player as the reward, but then how would I even optimize for it or even give it as an input? Do I just add the player as a feature in the game state, but then what should be the reward? Has this been done before, or something similar? I couldn't really find any paper or code that worked on differentiating the training data by who made it (I might not be wording it correctly). submitted by /u/OverhypeUnderdeliver [link] [comments]  ( 3 min )
  • Open

    Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience
    Data has become a huge area of business, helping businesses to drive their intelligence, make better decisions, and formulate strategic plans for future growth. The post Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience appeared first on Data Science Central.  ( 7 min )
    ML classifies gravitational-wave glitches with high accuracy
    The LIGO observatory can detect astronomical events from billions of light years away. Terabytes of complex daily data makes human analysis impossible. New study applies neural network with up to 97% classification accuracy. Caltech/MIT’s LIGO, the largest gravitational-wave observatory in the world, collects data on minute space-time ripples from cataclysmic astronomical events like colliding black… Read More »ML classifies gravitational-wave glitches with high accuracy The post ML classifies gravitational-wave glitches with high accuracy appeared first on Data Science Central.  ( 4 min )
    Zero Trust Principles: What is Zero Trust Model?
    The central principle of the Zero Trust model is based on the authentication and verification of every device connecting to the network before they are trusted. Former Forrester analyst and veteran of the high-technology world, John Kindervag, who has been actively part of a wide array of network technology projects, coined the term “Zero Trust”… Read More »Zero Trust Principles: What is Zero Trust Model? The post Zero Trust Principles: What is Zero Trust Model? appeared first on Data Science Central.  ( 4 min )

  • Open

    [R] Questions about ACL Rolling Review
    A few questions about ACL ARR: - If you request to reassign a reviewer, would the editor aim for reassigning all three reviewers or he would go for reassigning only that particular reviewer? Assume you have given a valid reason for reassignment and the editor is convinced. - If you request to reassign a reviewer, can the new reviewer see the previous reviews/scores before submitting his own review? Or he would access to the previous revision after submitting his own review. I already know (have heard) that in many cases reviewers are not available, and it becomes inevitable to get an entirely new set of reviews. I already know this. But my questions are about the case that the reviewer availability is not an issue. Juts trying to find out how things are managed submitted by /u/sim_inf [link] [comments]  ( 1 min )
    [R][P] MultiMAE: Multi-modal Multi-task Masked Autoencoders + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [Discussion]Is it possible to find a SWE job with a DS master degree? Or would it be possible to make the transtion later on?
    Say that a masters student graduates in a DS program with heavy focus on data and CS (so knows the basics of CS like data structures and programming, and has also studied courses like data mining, big data analytics, and machine learning), what are their possible job openings and relatively easy positions to get into? My understanding is really rudimentaly, feel free to correct me pls: Data scientist, this should be the most fitting and easy-to-get-interview position. Difficulty level 1/5. Data analyist, same as 1. Difficulty level 1/5. Data engineer, same as 1. Difficulty level 1/5. Machine learning engineer, has a higher bar than 1, 2, and 3, and it's very difficult to get interviews without proper background and work experience. So it's very difficult to become one for a masters graduate in DS, but it's quite possible for DS/DE (but much less so for DA) to make into MLE positions. Difficulty level 3/5. Software engineer, it's totally another realm, and has very few skill overlap with 1, 2, and 3. So it's very hard to make the transition or land a SWE job for DS students. Difficulty level 5/5. submitted by /u/Competitive_Map_935 [link] [comments]  ( 1 min )
    [Project] Open-source playground to generate images from text using DALL-E Mini
    submitted by /u/koryoislie [link] [comments]
    [D] Incorporating node features into GNNs?
    Hey all, I am looking to learn more about how to incorporate node features with their embeddings for training. Specifically, I am working with gene-gene interaction networks, and also want to include RNA-sequencing quantifications. If anyone has a good introductory resource so I can familiarize myself with the process, I would really appreciate it! submitted by /u/PM_ME_A_ONELINER [link] [comments]  ( 1 min )
    [D] Moderation uniformity in subreddit
    This isn't meant to be a rant. Rather far from it. Yesterday I posted a legitimate question about databases choices in r/MachineLearning. This was about what technical choices ML members are currently using for large scale data ingestion in a continual learning environment. The post was removed. That post was neither a (1) beginner nor (2) offensive and (3) aimed to be a constructive discussion suitable as mid-range ML query and (4) marked with appropriate flair I finally posted it elsewhere. Yet today I see questions about transitions between DS -> MLE and quirky labgroup names which can be used from ML terms. These aren't even research questions. https://www.reddit.com/r/MachineLearning/comments/u503vz/d_is_it_easier_to_transition_to_mle_as_a_ds_or_swe/ https://www.reddit.com/r/MachineLearning/comments/u5091o/d_do_you_know_any_funny_team_names_with/ How is this fair moderation genuinely? How can we improve the noise filter or do better moderation? submitted by /u/mlbloke [link] [comments]  ( 2 min )
    [D] Counterfactual Fairness
    So I watched this old video by Microsoft Research: https://www.youtube.com/watch?v=psA4U6nhZ70 To summarize, it uses the fairness criteria that sensitive attribute A will give the same prediction regardless of the value, when using counterfactuals. That is, if you're male or female, it shouldn't influence the models predictions. The idea seems decent at first glance. But what if the "bias" or "unfairness" that the model creates based on sensitive attribute A isn't caused by a dataset bias but rather detects a real signal in the data? The model proposed by Microsoft Research doesn't take into consideration that the prediction on the sensitive attribute A does not necessarily consist of ONLY unfairness. They simply define it as such. Is such an algorithmic design choice not exactly one of the flaws that we seek to eliminate? Assuming, that not all of the imbalance in predictions by the model on the sensitive attribute A is caused by "unfairness" but that some of it is caused by an inherent difference, then are they not introducing direct human bias and unfairness into their model by explicitedly designing the system to fit their own human (and political) bias? Don't get me wrong; the opposite is just as bad. Assuming that ALL of the imbalance in the prediction on the sensitive attribute A is caused by "inherent differences" is just as bad. Do you know of anyone that has tackled this in a good manner? How would you even begin to estimate how much is due to an "inherent difference" and how much is due to "bias, unfairness, noise" (or otherwise)? submitted by /u/caahel [link] [comments]  ( 4 min )
    [D] Paper Explained - Transformer Memory as a Differentiable Search Index (Full Video Walkthrough)
    https://youtu.be/qlB0TPBQ7YY Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! OUTLINE: 0:00 - Intro 0:45 - Sponsor: Diffgram 1:35 - Paper overview 3:15 - The search problem, classic and neural 8:15 - Seq2seq for directly predicting document IDs 11:05 - Differentiable search index architecture 18:05 - Indexing 25:15 - Retrieval and document representation 33:25 - Training DSI 39:15 - Experimental results 49:25 - Comments & Conclusions ​ Paper: https://arxiv.org/abs/2202.06991 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [R] Useful method to train models for adversarial robustness
    submitted by /u/IncredibleMac [link] [comments]
    [P] Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 1 min )
    [D] Spotify's Podcast Search Explained
    I wrote this article breaking down how Spotify have applied semantic search to enhance podcast discovery. I find it super interesting to see the approach Spotify have used in terms of data sources, model fine-tuning, and vector search - and wanted to show how to almost replicate it. Let me know if you have any thoughts on their approach! submitted by /u/jamescalam [link] [comments]
    [R] Machine learning in management of precautionary closures caused by lipophilic biotoxins
    In this work, we have covered a deep study of alternatives in order to improve the aquaculture of mussels with very noisy and unbalanced data https://www.sciencedirect.com/science/article/pii/S0168169922002733 submitted by /u/ennanco [link] [comments]
    [P] RR-GCN now supports multi-modal learning!
    We have just released v0.0.2 of our RR-GCN. This release includes support for multi-modal learning. Node embeddings can now be initialised with literal information or pre-trained embeddings for text and image data. Go check out our notebooks that show how we can achieve state-of-the-art performance on several benchmark datasets in less than one minute. Moreover, and more importantly, the representations produced by our RR-GCN are unsupervised and parameter-free (i.e. no training is required), making it possible to re-use them for multiple downstream ML tasks with high predictive performances. ​ https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]  ( 1 min )
    [D] Paper Explained – SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?
    https://youtu.be/XHAoV_nKr1o This video explains the 10 billion parameter SEER model from MetaAI by Goyal et al. 2022. Paper link: https://arxiv.org/abs/2202.08360 Official implementation: https://github.com/facebookresearch/vissl/tree/main/projects/SEER Short description: The 10 billion parameter SEER model from u/MetaAI is *fairer*, even though it is trained on *uncurated* data. How so? Check out our take on this. Outline: 00:00 Training on uncurated data 01:12 Diffgram (Sponsor) 01:46 Toxicity in large models 02:43 What to do against model toxicity? 03:53 SEER model explained 06:52 SEER is fairer. But how? submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
  • Open

    Is there an AI I could use to create an artificial Terence McKenna chatbot?
    He’s basically this wacky dead philosopher with 1000s of hours of his lectures of YT and I was thinking it may be possible to create an artificial AI personality of his from all of his recorded speech? Would there be a simple enough program I could download or anything of the sort? submitted by /u/Vaporshots [link] [comments]  ( 1 min )
    Boids: An artificial life simulation of a flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    AI Trippy Dream 37 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Amazing Generation
    Looks Amazing. The vibe is there. What do you think ? How did he archive this ? Created by Hand or through artificial ? https://www.tiktok.com/@ai.metascape/video/7086451191151971586 submitted by /u/PillowG1rl [link] [comments]
    Little Baby Chibi Lucy Loud
    submitted by /u/VIRUS-AOTOXIN [link] [comments]
    Artificial Intelligence is the Future of Deterrence
    submitted by /u/much_successes [link] [comments]
    LinkedIn Open-Sources ‘Feathr’, It’s Feature Store To Simplify Machine Learning (ML) Feature Management And Improve Developer Productivity
    LinkedIn research team has recently open-sourced feature store, Feathr, created to simplify machine learning (ML) feature management and increase developer productivity. Feathr is used by dozens of LinkedIn applications to define features, compute them for training, deploy them in production, and share them across consumers. Compared to previous application-specific feature pipeline solutions, Feathr users reported significantly reduced time required to add new features to model training and improved runtime performance. Hundreds of ML models run on LinkedIn in Search, Feed, and Ads applications. Thousands of features about entities in the Economic Graph, such as companies, job postings, and LinkedIn members, power the models. The most time-consuming aspects of handling the ML applications at scale have been preparing and managing features. Continue reading the summary Github: https://github.com/linkedin/feathr LinkedIn Blog: https://engineering.linkedin.com/blog/2022/open-sourcing-feathr—linkedin-s-feature-store-for-productive-m submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Artificial Nightmares: Crypt Walker || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    I created a DIY python package to ensemble multimodal models
    Multimodal: A python package to ensemble speech, text, etc. models and build new applications. Sample Applications: Speech Named Entity Anonymizer, Speech Question Answering, Speech Generation Code: kritiksoman/Multimodal: Listen. Write. Speak. Read. Think. (github.com) submitted by /u/kritiksoman [link] [comments]
  • Open

    Rigorous treatment of MDPs, Bellman, etc. in continuous spaces?
    I am looking for a book/monograph that goes through all the basics of reinforcement learning for continuous spaces with mathematical rigor. The classic RL book from Sutton/Barto and the new RL theory book from Agarwal/Jiang/Kakade/Sun both stick to finite MDPs except for special cases like linear MDPs and the LQR. I assume that a general statement of the fundamentals for continuous spaces will require grinding through a lot of details on existence, measurability, suprema vs. maxima, etc., that are not issues in the finite case. Is this why these authors avoid it? clarifying edit: I don't need to go all the way to continuous time - just state and action spaces. Maybe one of Bertsekas's books? submitted by /u/quadprog [link] [comments]  ( 1 min )
    NEED HELP MAKING A BASIC PYTHON MODEL
    I have a 2 column dataset “Date” “Result”. The Result column produces a 0 or 1 for each date. I need to make a reinforcement model that will predict whether or not the next result will be a 0 or 1. It needs to be done in jupyter notebook . submitted by /u/EffectiveBug4629 [link] [comments]  ( 1 min )
    From machine learning to sequential decision problems (reinforcement learning)
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process. An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book for a complete summary of the four classes for pure learning problems (aka bandit problems). See https://tinyurl.com/RLandSO/ Curious why Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Policy gradient vs. Policy iteration?
    Hello, I'm currently learning about MDPs and machine learning. I have a few questions that might be trivial or obvious but I can't find many concrete answers online: -Are policy gradient and policy iteration similar/the same? From what I can gather, policy iteration is a type or subset of policy gradient algorithm, is this correct? -Are all policy learning methods less effective for large state spaces? From my understanding you need to use some kind of value function iteration and heursitic function for larger state spaces because you can't encounter all states enough times to converge on an optimal policy -Does convergence on a policy/value function find a local or global optimum? With neural nets, simple backpropagation may only find a local minimum for the cost function, is this true of MDP/RL iteration algorithms? Thanks!! submitted by /u/egad_a_mouse [link] [comments]  ( 2 min )
    How to create a layer without inputs in tensorflow.
    In deep rl algorithm like PPO, a continuous stochastic policy is represented by Normal Distribution. For this the recommended way of creating a Normal Distribution is to get the mean by passing the state through NN and then using a state independent layer to predict log_std. This layer which predicts log_std should be trainable using backprop just like biases. So how to create this layer in tensorflow 2. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
  • Open

    Machine Learning-based Anomaly Detection in Optical Fiber Monitoring. (arXiv:2204.07059v1 [cs.NI])
    Secure and reliable data communication in optical networks is critical for high-speed Internet. However, optical fibers, serving as the data transmission medium providing connectivity to billons of users worldwide, are prone to a variety of anomalies resulting from hard failures (e.g., fiber cuts) and malicious physical attacks (e.g., optical eavesdropping (fiber tapping)) etc. Such anomalies may cause network disruption and thereby inducing huge financial and data losses, or compromise the confidentiality of optical networks by gaining unauthorized access to the carried data, or gradually degrade the network operations. Therefore, it is highly required to implement efficient anomaly detection, diagnosis, and localization schemes for enhancing the availability and reliability of optical networks. In this paper, we propose a data driven approach to accurately and quickly detect, diagnose, and localize fiber anomalies including fiber cuts, and optical eavesdropping attacks. The proposed method combines an autoencoder-based anomaly detection and an attention-based bidirectional gated recurrent unit algorithm, whereby the former is used for fault detection and the latter is adopted for fault diagnosis and localization once an anomaly is detected by the autoencoder. We verify the efficiency of our proposed approach by experiments under various anomaly scenarios using real operational data. The experimental results demonstrate that: (i) the autoencoder detects any fiber fault or anomaly with an F1 score of 96.86%; and (ii) the attention-based bidirectional gated recurrent unit algorithm identifies the the detected anomalies with an average accuracy of 98.2%, and localizes the faults with an average root mean square error of 0.19 m.  ( 2 min )
    Robust No-Regret Learning in Min-Max Stackelberg Games. (arXiv:2203.14126v2 [cs.GT] UPDATED)
    The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.  ( 2 min )
    Open-Set Recognition: a Good Closed-Set Classifier is All You Need?. (arXiv:2110.06207v2 [cs.CV] CROSS LISTED)
    The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We find that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale ImageNet evaluation. Second, we use this correlation to boost the performance of a maximum logit score OSR 'baseline' by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks. Similarly, we boost the performance of the existing state-of-the-art method by improving its closed-set accuracy, but the resulting discrepancy with the strong baseline is marginal. Our third contribution is to present the 'Semantic Shift Benchmark' (SSB), which better respects the task of detecting semantic novelty, in contrast to other forms of distribution shift also considered in related sub-fields, such as out-of-distribution detection. On this new evaluation, we again demonstrate that there is negligible difference between the strong baseline and the existing state-of-the-art. Project Page: https://www.robots.ox.ac.uk/~vgg/research/osr/  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. (arXiv:2204.06806v1 [cs.CV])
    We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox  ( 2 min )
    MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners. (arXiv:2204.06833v1 [cs.RO])
    For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics.  ( 2 min )
    Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations. (arXiv:2202.07800v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA. Examples include that tokens containing semantically meaningless or distractive image backgrounds do not positively contribute to the ViT predictions. In this work, we propose to reorganize image tokens during the feed-forward process of ViT models, which is integrated into ViT during training. For each forward inference, we identify the attentive image tokens between MHSA and FFN (i.e., feed-forward network) modules, which is guided by the corresponding class token attention. Then, we reorganize image tokens by preserving attentive image tokens and fusing inattentive ones to expedite subsequent MHSA and FFN computations. To this end, our method EViT improves ViTs from two perspectives. First, under the same amount of input image tokens, our method reduces MHSA and FFN computation for efficient inference. For instance, the inference speed of DeiT-S is increased by 50% while its recognition accuracy is decreased by only 0.3% for ImageNet classification. Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images. An example is that we improve the recognition accuracy of DeiT-S by 1% for ImageNet classification at the same computational cost of a vanilla DeiT-S. Meanwhile, our method does not introduce more parameters to ViTs. Experiments on the standard benchmarks show the effectiveness of our method. The code is available at https://github.com/youweiliang/evit  ( 2 min )
    Learning Task-Aware Energy Disaggregation: a Federated Approach. (arXiv:2204.06767v1 [cs.LG])
    We consider the problem of learning the energy disaggregation signals for residential load data. Such task is referred as non-intrusive load monitoring (NILM), and in order to find individual devices' power consumption profiles based on aggregated meter measurements, a machine learning model is usually trained based on large amount of training data coming from a number of residential homes. Yet collecting such residential load datasets require both huge efforts and customers' approval on sharing metering data, while load data coming from different regions or electricity users may exhibit heterogeneous usage patterns. Both practical concerns make training a single, centralized NILM model challenging. In this paper, we propose a decentralized and task-adaptive learning scheme for NILM tasks, where nested meta learning and federated learning steps are designed for learning task-specific models collectively. Simulation results on benchmark dataset validate proposed algorithm's performance on efficiently inferring appliance-level consumption for a variety of homes and appliances.  ( 2 min )
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    GM-TOuNN: Graded Multiscale Topology Optimization using Neural Networks. (arXiv:2204.06682v1 [cs.CE])
    Multiscale topology optimization (M-TO) entails generating an optimal global topology, and an optimal set of microstructures at a smaller scale, for a physics-constrained problem. With the advent of additive manufacturing, M-TO has gained significant prominence. However, generating optimal microstructures at various locations can be computationally very expensive. As an alternate, graded multiscale topology optimization (GM-TO) has been proposed where one or more pre-selected and graded (parameterized) microstructural topologies are used to fill the domain optimally. This leads to a significant reduction in computation while retaining many of the benefits of M-TO. A successful GM-TO framework must: (1) be capable of efficiently handling numerous pre-selected microstructures, (2) be able to continuously switch between these microstructures during optimization, (3) ensure that the partition of unity is satisfied, and (4) discourage microstructure mixing at termination. In this paper, we propose to meet these requirements by exploiting the unique classification capacity of neural networks. Specifically, we propose a graded multiscale topology optimization using neural-network (GM-TOuNN) framework with the following features: (1) the number of design variables is only weakly dependent on the number of pre-selected microstructures, (2) it guarantees partition of unity while discouraging microstructure mixing, and (3) it supports automatic differentiation, thereby eliminating manual sensitivity analysis. The proposed framework is illustrated through several examples.  ( 2 min )
    Medical Application of Geometric Deep Learning for the Diagnosis of Glaucoma. (arXiv:2204.07004v1 [eess.IV])
    Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D raster scans of the ONH were acquired with Spectralis OCT for 477 glaucoma and 2,296 non-glaucoma subjects at the Singapore National Eye Centre. All volumes were automatically segmented using deep learning to identify 7 major neural and connective tissues including the RNFL, the prelamina, and the lamina cribrosa (LC). Each ONH was then represented as a 3D point cloud with 1,000 points chosen randomly from all tissue boundaries. To simplify the problem, all ONH point clouds were aligned with respect to the plane and center of Bruch's membrane opening. Geometric deep learning (PointNet) was then used to provide a glaucoma diagnosis from a single OCT point cloud. The performance of our approach was compared to that obtained with a 3D CNN, and with RNFL thickness. Results: PointNet was able to provide a robust glaucoma diagnosis solely from the ONH represented as a 3D point cloud (AUC=95%). The performance of PointNet was superior to that obtained with a standard 3D CNN (AUC=87%) and with that obtained from RNFL thickness alone (AUC=80%). Discussion: We provide a proof-of-principle for the application of geometric deep learning in the field of glaucoma. Our technique requires significantly less information as input to perform better than a 3D CNN, and with an AUC superior to that obtained from RNFL thickness alone. Geometric deep learning may have wide applicability in the field of Ophthalmology.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words. (arXiv:2204.04833v2 [cs.CL] UPDATED)
    Word embedding systems such as Word2Vec and GloVe are well-known in deep learning approaches to NLP. This is largely due to their ability to capture semantic relationships between words. In this work we investigated their usefulness in capturing rhythmic similarity of words instead. The results show that vectors these embeddings assign to rhyming words are more similar to each other, compared to the other words. It is also revealed that GloVe performs relatively better than Word2Vec in this regard. We also proposed a first of its kind metric for quantifying rhythmic similarity of a pair of words.  ( 2 min )
    BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks. (arXiv:2204.07054v1 [q-bio.NC])
    Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their established performance in other fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by 1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and 2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we also host the BrainGB website at https:// brainnet.us/ with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.  ( 2 min )
    Accelerated Policy Learning with Parallel Differentiable Simulation. (arXiv:2204.07137v1 [cs.LG])
    Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Latent Aspect Detection from Online Unsolicited Customer Reviews. (arXiv:2204.06964v1 [cs.CL])
    Within the context of review analytics, aspects are the features of products and services at which customers target their opinions and sentiments. Aspect detection helps product owners and service providers to identify shortcomings and prioritize customers' needs, and hence, maintain revenues and mitigate customer churn. Existing methods focus on detecting the surface form of an aspect by training supervised learning methods that fall short when aspects are latent in reviews. In this paper, we propose an unsupervised method to extract latent occurrences of aspects. Specifically, we assume that a customer undergoes a two-stage hypothetical generative process when writing a review: (1) deciding on an aspect amongst the set of aspects available for the product or service, and (2) writing the opinion words that are more interrelated to the chosen aspect from the set of all words available in a language. We employ latent Dirichlet allocation to learn the latent aspects distributions for generating the reviews. Experimental results on benchmark datasets show that our proposed method is able to improve the state of the art when the aspects are latent with no surface form in reviews.  ( 2 min )
    Neighborhood Attention Transformer. (arXiv:2204.07143v1 [cs.CV])
    We present Neighborhood Attention Transformer (NAT), an efficient, accurate and scalable hierarchical transformer that works well on both image classification and downstream vision tasks. It is built upon Neighborhood Attention (NA), a simple and flexible attention mechanism that localizes the receptive field for each query to its nearest neighboring pixels. NA is a localization of self-attention, and approaches it as the receptive field size increases. It is also equivalent in FLOPs and memory usage to Swin Transformer's shifted window attention given the same receptive field size, while being less constrained. Furthermore, NA includes local inductive biases, which eliminate the need for extra operations such as pixel shifts. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet with only 4.3 GFLOPs and 28M parameters, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20k. We will open-source our checkpoints, training script, configurations, and our CUDA kernel at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .  ( 2 min )
    Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar. (arXiv:2204.06643v1 [cs.LG])
    We introduce NSEdit (neural-symbolic edit), a novel Transformer-based code repair method. Given only the source code that contains bugs, NSEdit predicts an editing sequence that can fix the bugs. The edit grammar is formulated as a regular language, and the Transformer uses it as a neural-symbolic scripting interface to generate editing programs. We modify the Transformer and add a pointer network to select the edit locations. An ensemble of rerankers are trained to re-rank the editing sequences generated by beam search. We fine-tune the rerankers on the validation set to reduce over-fitting. NSEdit is evaluated on various code repair datasets and achieved a new state-of-the-art accuracy ($24.04\%$) on the Tufano small dataset of the CodeXGLUE benchmark. NSEdit performs robustly when programs vary from packages to packages and when buggy programs are concrete. We conduct detailed analysis on our methods and demonstrate the effectiveness of each component.  ( 2 min )
    Reflective Fiber Faults Detection and Characterization Using Long-Short-Term Memory. (arXiv:2204.07058v1 [cs.NI])
    To reduce operation-and-maintenance expenses (OPEX) and to ensure optical network survivability, optical network operators need to detect and diagnose faults in a timely manner and with high accuracy. With the rapid advancement of telemetry technology and data analysis techniques, data-driven approaches leveraging telemetry data to tackle the fault diagnosis problem have been gaining popularity due to their quick implementation and deployment. In this paper, we propose a novel multi-task learning model based on long short-term memory (LSTM) to detect, locate, and estimate the reflectance of fiber reflective faults (events) including the connectors and the mechanical splices by extracting insights from monitored data obtained by the optical time domain reflectometry (OTDR) principle commonly used for troubleshooting of fiber optic cables or links. The experimental results prove that the proposed method: (i) achieves a good detection capability and high localization accuracy within short measurement time even for low SNR values; and (ii) outperforms conventionally employed techniques.
    The Power of Linear Recurrent Neural Networks. (arXiv:1802.03308v6 [cs.LG] UPDATED)
    Recurrent neural networks are a powerful means to cope with time series. We show how linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the size of an LRNN can be reduced significantly in one step, after inspecting the eigenvalues of the network transition matrix, by taking only the most relevant components. Therefore, in contrast to others, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.  ( 2 min )
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.  ( 2 min )
    Interpretability of Machine Learning Methods Applied to Neuroimaging. (arXiv:2204.07005v1 [cs.CV])
    Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant.  ( 2 min )
    The MIT Supercloud Workload Classification Challenge. (arXiv:2204.05839v2 [cs.DC] UPDATED)
    High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.  ( 2 min )
    Learning and controlling the source-filter representation of speech with a variational autoencoder. (arXiv:2204.07075v1 [cs.SD])
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that $f_0$ and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and independently control these speech factors of variation within the learned latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation of speech signals.  ( 2 min )
    Solving AC Power Flow with Graph Neural Networks under Realistic Constraints. (arXiv:2204.07000v1 [cs.LG])
    In this paper we propose a graph neural network architecture solving the AC power flow problem under realistic constraints. While the energy transition is changing the energy industry to a digitalized and decentralized energy system, the challenges are increasingly shifting to the distribution grid level to integrate new loads and generation technologies. To ensure a save and resilient operation of distribution grids, AC power flow calculations are the means of choice to determine grid operating limits or analyze grid asset utilization in planning procedures. In our approach we demonstrate the development of a framework which makes use of graph neural networks to learn the physical constraints of the power flow. We present our model architecture on which we perform unsupervised training to learn a general solution of the AC power flow formulation that is independent of the specific topologies and supply tasks used for training. Finally, we demonstrate, validate and discuss our results on medium voltage benchmark grids.  ( 2 min )
    Generative power of a protein language model trained on multiple sequence alignments. (arXiv:2204.07110v1 [q-bio.BM])
    Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly uses the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences generally score better than those generated by Potts models, and even than natural sequences, for homology, coevolution and structure-based measures. Moreover, MSA Transformer better reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models, although Potts models better reproduce first- and second-order statistics. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
    Matrix Completion with Heterogonous Cost. (arXiv:2203.12120v2 [cs.LG] UPDATED)
    The matrix completion problem has been studied broadly under many underlying conditions. The problem has been explored under adaptive or non-adaptive, exact or estimation, single-phase or multi-phase, and many other categories. In most of these cases, the observation cost of each entry is uniform and has the same cost across the columns. However, in many real-life scenarios, we could expect elements from distinct columns or distinct positions to have a different cost. In this paper, we explore this generalization under adaptive conditions. We approach the problem under two different cost models. The first one is that entries from different columns have different observation costs, but, within the same column, each entry has a uniform cost. The second one is any two entry has different observation cost, despite being the same or different columns. We provide complexity analysis of our algorithms and provide tightness guarantees.
    Semi-Supervised Convolutive NMF for Automatic Piano Transcription. (arXiv:2202.04989v2 [cs.SD] UPDATED)
    Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
    HCFL: A High Compression Approach for Communication-Efficient Federated Learning in Very Large Scale IoT Networks. (arXiv:2204.06760v1 [cs.LG])
    Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. In this work, we develop a novel compression scheme for FL, called high-compression federated learning (HCFL), for very large scale IoT networks. HCFL can reduce the data load for FL processes without changing their structure and hyperparameters. In this way, we not only can significantly reduce communication costs, but also make intensive learning processes more adaptable on low-computing resource IoT devices. Furthermore, we investigate a relationship between the number of IoT devices and the convergence level of the FL model and thereby better assess the quality of the FL process. We demonstrate our HCFL scheme in both simulations and mathematical analyses. Our proposed theoretical research can be used as a minimum level of satisfaction, proving that the FL process can achieve good performance when a determined configuration is met. Therefore, we show that HCFL is applicable in any FL-integrated networks with numerous IoT devices.
    Reinforcement Learning Policy Recommendation for Interbank Network Stability. (arXiv:2204.07134v1 [econ.GN])
    In this paper we analyze the effect of a policy recommendation on the performances of an artificial interbank market. Financial institutions stipulate lending agreements following a public recommendation and their individual information. The former, modeled by a reinforcement learning optimal policy trying to maximize the long term fitness of the system, gathers information on the economic environment and directs economic actors to create credit relationships based on the optimal choice between a low interest rate or high liquidity supply. The latter, based on the agents' balance sheet, allows to determine the liquidity supply and interest rate that the banks optimally offer on the market. Based on the combination between the public and the private signal, financial institutions create or cut their credit connections over time via a preferential attachment evolving procedure able to generate a dynamic network. Our results show that the emergence of a core-periphery interbank network, combined with a certain level of homogeneity on the size of lenders and borrowers, are essential features to ensure the resilience of the system. Moreover, the reinforcement learning optimal policy recommendation plays a crucial role in mitigating systemic risk with respect to alternative policy instruments.
    Constrained Deep One-Class Feature Learning For Classifying Imbalanced Medical Images. (arXiv:2111.10610v2 [eess.IV] UPDATED)
    Medical image data are usually imbalanced across different classes. One-class classification has attracted increasing attention to address the data imbalance problem by distinguishing the samples of the minority class from the majority class. Previous methods generally aim to either learn a new feature space to map training samples together or to fit training samples by autoencoder-like models. These methods mainly focus on capturing either compact or descriptive features, where the information of the samples of a given one class is not sufficiently utilized. In this paper, we propose a novel deep learning-based method to learn compact features by adding constraints on the bottleneck features, and to preserve descriptive features by training an autoencoder at the same time. Through jointly optimizing the constraining loss and the autoencoder's reconstruction loss, our method can learn more relevant features associated with the given class, making the majority and minority samples more distinguishable. Experimental results on three clinical datasets (including the MRI breast images, FFDM breast images and chest X-ray images) obtains state-of-art performance compared to previous methods.
    Supplementation of deep neural networks with simplified physics-based features to increase model prediction accuracy. (arXiv:2204.06764v1 [cs.ET])
    To improve predictive models for STEM applications, supplemental physics-based features computed from input parameters are introduced into single and multiple layers of a deep neural network (DNN). While many studies focus on informing DNNs with physics through differential equations or numerical simulation, much may be gained through integration of simplified relationships. To evaluate this hypothesis, a number of thin rectangular plates simply-supported on all edges are simulated for five materials. With plate dimensions and material properties as input features and fundamental natural frequency as the sole output, predictive performance of a purely data-driven DNN-based model is compared with models using additional inputs computed from simplified physical relationships among baseline parameters, namely plate weight, modulus of rigidity, and shear modulus. To better understand the benefit to model accuracy, these additional features are injected into various single and multiple DNN layers, and trained with four different dataset sizes. When these physics-enhanced models are evaluated against independent data of the same materials and similar dimensions to the training sets, supplementation with simplified physics-based parameters provides little reduction in prediction error over the baseline for models trained with dataset sizes of 60 and greater, although small improvement from 19.3% to 16.1% occurs when trained with a sparse size of 30. Conversely, notable accuracy gains occur when the independent test data is of material and dimensions not conforming to the training set. Specifically, when physics-enhanced data is injected into multiple DNN layers, reductions in error from 33.2% to 19.6%, 34.9% to 19.9%, 35.8% to 22.4%, and 43.0% to 28.4% are achieved for training dataset sizes of 261, 117, 60, and 30, respectively, demonstrating attainment of a degree of generalizability.
    ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes. (arXiv:2201.07788v2 [cs.CV] UPDATED)
    Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.
    Time Series of Non-Additive Metrics: Identification and Interpretation of Contributing Factors of Variance by Linear Decomposition. (arXiv:2204.06688v1 [cs.LG])
    The research paper addresses linear decomposition of time series of non-additive metrics that allows for the identification and interpretation of contributing factors (input features) of variance. Non-additive metrics, such as ratios, are widely used in a variety of domains. It commonly requires preceding aggregations of underlying variables that are used to calculate the metric of interest. The latest poses a dimensionality challenge when the input features and underlying variables are formed as two-dimensional arrays along elements, such as account or customer identifications, and time points. It rules out direct modeling of the time series of a non-additive metric as a function of input features. The article discusses a five-step approach: (1) segmentations of input features and the underlying variables of the metric that are supported by unsupervised autoencoders, (2) univariate or joint fittings of the metric by the aggregated input features on the segmented domains, (3) transformations of pre-screened input features according to the fitted models, (4) aggregation of the transformed features as time series, and (5) modelling of the metric time series as a sum of constrained linear effects of the aggregated features. Alternatively, approximation by numerical differentiation has been considered to linearize the metric. It allows for element level univariate or joint modeling of step (2). The process of these analytical steps allows for a backward-looking explanatory decomposition of the metric as a sum of time series of the survived input features. The paper includes a synthetic example that studies loss-to-balance monthly rates of a hypothetical retail credit portfolio. To validate that no latent factors other than the survived input features have significant impacts on the metric, Statistical Process Control has been introduced for the residual time series.
    End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks. (arXiv:2204.01681v2 [physics.ins-det] UPDATED)
    We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of ${\cal O}(1000)$ particles in high-luminosity conditions with 200 pileup.
    Machine Learning State-of-the-Art with Uncertainties. (arXiv:2204.05173v2 [cs.LG] UPDATED)
    With the availability of data, hardware, software ecosystem and relevant skill sets, the machine learning community is undergoing a rapid development with new architectures and approaches appearing at high frequency every year. In this article, we conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results as well as impact the reviewing process. In addition, we explore the hallmarks and limitations of this approximation. We discuss the relevance of this approach reflecting on a spotlight publication of ICLR22. A reproducible workflow is made available as an open-source adjoint to this publication. Based on our discussion, we make suggestions for improving the authoring and reviewing process of machine learning articles.
    Characterizing the Fundamental Trade-offs in Learning Invariant Representations. (arXiv:2109.03386v2 [cs.LG] UPDATED)
    Many applications of representation learning, such as privacy-preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. Solutions to such problems lead to trade-offs between the two objectives when they are competing with each other. While existing works study bounds on these trade-offs, three questions still remain outstanding: \emph{What are the exact fundamental trade-offs between utility and invariance?}, 2) \emph{What is the optimal dimensionality of the representation?}, and 3) \emph{What are the encoders (mapping data to a representation) that achieve the exact fundamental trade-offs and how can we estimate them from data?} This paper addresses these questions. We adopt a functional analysis perspective and derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs, optimal representation dimensionality, and the corresponding encoders. We also numerically quantify the trade-offs on representative problems and compare them to those achieved by baseline invariant representation learning algorithms.
    StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2. (arXiv:2112.14683v2 [cs.CV] UPDATED)
    Videos show continuous events, yet most $-$ if not all $-$ video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be $-$ time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames' features. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024$^2$ videos for the first time. We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model is tested on four modern 256$^2$ and one 1024$^2$-resolution video synthesis benchmarks. In terms of sheer metrics, it performs on average ${\approx}30\%$ better than the closest runner-up. Project website: https://universome.github.io.
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
    Epileptic Seizure Risk Assessment by Multi-Channel Imaging of the EEG. (arXiv:2204.07034v1 [eess.SP])
    Refractory epileptic patients can suffer a seizure at any moment. Seizure prediction would substantially improve their lives. In this work, based on scalp EEG and its transformation into images, the likelihood of an epileptic seizure occurring at any moment is computed using an average of the softmax layer output (the likelihood) of a CNN, instead of the output of the classification layer. Results show that by analyzing the likelihood and thresholding it, prediction has higher sensitivity or a lower FPR/h. The best threshold for the likelihood was higher than 50% for 5 patients, and was lower for the remaining 36. However, more testing is needed, especially in new seizures, to better assess the real performance of this method. This work is a proof of concept with a positive outlook.
    Learning Spectral Unions of Partial Deformable 3D Shapes. (arXiv:2104.00514v2 [cs.GR] UPDATED)
    Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging problem of recovering the geometry from the spectral information of partial shapes. In this paper, we propose a possible way to fill this gap. We introduce a learning-based method to estimate the Laplacian spectrum of the union of partial non-rigid 3D shapes, without actually computing the 3D geometry of the union or any correspondence between those partial shapes. We do so by operating purely in the spectral domain and by defining the union operation between short sequences of eigenvalues. We show that the approximated union spectrum can be used as-is to reconstruct the complete geometry [MRC*19], perform region localization on a template [RTO*19] and retrieve shapes from a database, generalizing ShapeDNA [RWP06] to work with partialities. Working with eigenvalues allows us to deal with unknown correspondence, different sampling, and different discretizations (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies or horses), as well as to partiality artifacts not seen at training time.
    A Neural Network based Framework for Effective Laparoscopic Video Quality Assessment. (arXiv:2202.04517v2 [eess.IV] UPDATED)
    Video quality assessment is a challenging problem having a critical significance in the context of medical imaging. For instance, in laparoscopic surgery, the acquired video data suffers from different kinds of distortion that not only hinder surgery performance but also affect the execution of subsequent tasks in surgical navigation and robotic surgeries. For this reason, we propose in this paper neural network-based approaches for distortion classification as well as quality prediction. More precisely, a Residual Network (ResNet) based approach is firstly developed for simultaneous ranking and classification task. Then, this architecture is extended to make it appropriate for the quality prediction task by using an additional Fully Connected Neural Network (FCNN). To train the overall architecture (ResNet and FCNN models), transfer learning and end-to-end learning approaches are investigated. Experimental results, carried out on a new laparoscopic video quality database, have shown the efficiency of the proposed methods compared to recent conventional and deep learning based approaches.
    SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework. (arXiv:2204.07072v1 [cs.CV])
    Multi-animal pose estimation is essential for studying animals' social behaviors in neuroscience and neuroethology. Advanced approaches have been proposed to support multi-animal estimation and achieve state-of-the-art performance. However, these models rarely exploit unlabeled data during training even though real world applications have exponentially more unlabeled frames than labeled frames. Manually adding dense annotations for a large number of images or videos is costly and labor-intensive, especially for multiple instances. Given these deficiencies, we propose a novel semi-supervised architecture for multi-animal pose estimation, leveraging the abundant structures pervasive in unlabeled frames in behavior videos to enhance training, which is critical for sparsely-labeled problems. The resulting algorithm will provide superior multi-animal pose estimation results on three animal experiments compared to the state-of-the-art baseline and exhibits more predictive power in sparsely-labeled data regimes.
    Transformers and the representation of biomedical background knowledge. (arXiv:2202.02432v2 [cs.CL] UPDATED)
    BioBERT and BioMegatron are Transformers models adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.
    Q-TART: Quickly Training for Adversarial Robustness and in-Transferability. (arXiv:2204.07024v1 [cs.CV])
    Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important. We propose to simultaneously tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART, Quickly Train for Adversarial Robustness and in-Transferability. Q-TART follows the intuition that samples highly susceptible to noise strongly affect the decision boundaries learned by DNNs, which in turn degrades their performance and adversarial susceptibility. By identifying and removing such samples, we demonstrate improved performance and adversarial robustness while using only a subset of the training data. Through our experiments we highlight Q-TART's high performance across multiple Dataset-DNN combinations, including ImageNet, and provide insights into the complementary behavior of Q-TART alongside existing adversarial training approaches to increase robustness by over 1.3% while using up to 17.9% less training time.
    HASA: Hybrid Architecture Search with Aggregation Strategy for Echinococcosis Classification and Ovary Segmentation in Ultrasound Images. (arXiv:2204.06697v1 [cs.CV])
    Different from handcrafted features, deep neural networks can automatically learn task-specific features from data. Due to this data-driven nature, they have achieved remarkable success in various areas. However, manual design and selection of suitable network architectures are time-consuming and require substantial effort of human experts. To address this problem, researchers have proposed neural architecture search (NAS) algorithms which can automatically generate network architectures but suffer from heavy computational cost and instability if searching from scratch. In this paper, we propose a hybrid NAS framework for ultrasound (US) image classification and segmentation. The hybrid framework consists of a pre-trained backbone and several searched cells (i.e., network building blocks), which takes advantage of the strengths of both NAS and the expert knowledge from existing convolutional neural networks. Specifically, two effective and lightweight operations, a mixed depth-wise convolution operator and a squeeze-and-excitation block, are introduced into the candidate operations to enhance the variety and capacity of the searched cells. These two operations not only decrease model parameters but also boost network performance. Moreover, we propose a re-aggregation strategy for the searched cells, aiming to further improve the performance for different vision tasks. We tested our method on two large US image datasets, including a 9-class echinococcosis dataset containing 9566 images for classification and an ovary dataset containing 3204 images for segmentation. Ablation experiments and comparison with other handcrafted or automatically searched architectures demonstrate that our method can generate more powerful and lightweight models for the above US image classification and segmentation tasks.
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.
    Geometric Deep Learning to Identify the Critical 3D Structural Features of the Optic Nerve Head for Glaucoma Diagnosis. (arXiv:2204.06931v1 [eess.IV])
    Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) To compare the performance of two relatively recent geometric deep learning techniques in diagnosing glaucoma from a single OCT scan of the ONH; and (2) To identify the 3D structural features of the ONH that are critical for the diagnosis of glaucoma. Methods: In this study, we included a total of 2,247 non-glaucoma and 2,259 glaucoma scans from 1,725 subjects. All subjects had their ONHs imaged in 3D with Spectralis OCT. All OCT scans were automatically segmented using deep learning to identify major neural and connective tissues. Each ONH was then represented as a 3D point cloud. We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from such 3D ONH point clouds and to identify the critical 3D structural features of the ONH for glaucoma diagnosis. Results: Both the DGCNN (AUC: 0.97$\pm$0.01) and PointNet (AUC: 0.95$\pm$0.02) were able to accurately detect glaucoma from 3D ONH point clouds. The critical points formed an hourglass pattern with most of them located in the inferior and superior quadrant of the ONH. Discussion: The diagnostic accuracy of both geometric deep learning approaches was excellent. Moreover, we were able to identify the critical 3D structural features of the ONH for glaucoma diagnosis that tremendously improved the transparency and interpretability of our method. Consequently, our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders.
    Integration of neural network and fuzzy logic decision making compared with bilayered neural network in the simulation of daily dew point temperature. (arXiv:2202.12256v2 [cs.LG] UPDATED)
    In this research, dew point temperature (DPT) is simulated using the data-driven approach. Adaptive Neuro-Fuzzy Inference System (ANFIS) is utilized as a data-driven technique to forecast this parameter at Tabriz in East Azerbaijan. Various input patterns, namely T min, T max, and T mean, are utilized for training the architecture whilst DPT is the model's output. The findings indicate that, in general, ANFIS method is capable of identifying data patterns with a high degree of accuracy. However, the approach demonstrates that processing time and computer resources may substantially increase by adding additional functions. Based on the results, the number of iterations and computing resources might change dramatically if new functionalities are included. As a result, tuning parameters have to be optimized inside the method framework. The findings demonstrate a high agreement between results by the data-driven technique (machine learning method) and the observed data. Using this prediction toolkit, DPT can be adequately forecasted solely based on the temperature distribution of Tabriz. This kind of modeling is extremely promising for predicting DPT at various sites. Besides, this study thoroughly compares the Bilayered Neural Network (BNN) and ANFIS models on various scales. Whilst the ANFIS model is extremely stable for almost all numbers of membership functions, the BNN model is highly sensitive to this scale factor to predict DPT.
    Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning. (arXiv:2204.06748v1 [cs.CL])
    Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
    A Collection of Deep Learning-based Feature-Free Approaches for Characterizing Single-Objective Continuous Fitness Landscapes. (arXiv:2204.05752v2 [cs.LG] UPDATED)
    Exploratory Landscape Analysis is a powerful technique for numerically characterizing landscapes of single-objective continuous optimization problems. Landscape insights are crucial both for problem understanding as well as for assessing benchmark set diversity and composition. Despite the irrefutable usefulness of these features, they suffer from their own ailments and downsides. Hence, in this work we provide a collection of different approaches to characterize optimization landscapes. Similar to conventional landscape features, we require a small initial sample. However, instead of computing features based on that sample, we develop alternative representations of the original sample. These range from point clouds to 2D images and, therefore, are entirely feature-free. We demonstrate and validate our devised methods on the BBOB testbed and predict, with the help of Deep Learning, the high-level, expert-based landscape properties such as the degree of multimodality and the existence of funnel structures. The quality of our approaches is on par with methods relying on the traditional landscape features. Thereby, we provide an exciting new perspective on every research area which utilizes problem information such as problem understanding and algorithm design as well as automated algorithm configuration and selection.
    Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022. (arXiv:2204.01142v2 [math.AT] UPDATED)
    Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.
    ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision. (arXiv:2204.06863v1 [cs.LG])
    A way to overcome expensive and time-consuming manual data labeling is weak supervision - automatic annotation of data samples via a predefined set of labeling functions (LFs), rule-based mechanisms that generate potentially erroneous labels. In this work, we investigate noise reduction techniques for weak supervision based on the principle of k-fold cross-validation. In particular, we extend two frameworks for detecting the erroneous samples in manually annotated data to the weakly supervised setting. Our methods profit from leveraging the information about matching LFs and detect noisy samples more accurately. We also introduce a new algorithm for denoising the weakly annotated data called ULF, that refines the allocation of LFs to classes by estimating the reliable LFs-to-classes joint matrix. Evaluation on several datasets shows that ULF successfully improves weakly supervised learning without using any manually labeled data.
    A Melody-Unsupervision Model for Singing Voice Synthesis. (arXiv:2110.06546v2 [eess.AS] UPDATED)
    Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
    Unsupervised Temporal Learning on Monocular Videos for 3D Human Pose Estimation. (arXiv:2012.01511v3 [cs.CV] UPDATED)
    In this paper we propose an unsupervised learning method to extract temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying CSS only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features into the time-variant component, well-suited for human pose estimation. Our approach reduces error by about 50\% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis. (arXiv:2112.05745v3 [eess.SY] UPDATED)
    In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of dynamical systems. By sampling inputs, evaluating their images in the true reachable set, and taking their $\epsilon$-padded convex hull as a set estimator, this algorithm applies to general problem settings and is simple to implement. Our main contribution is the derivation of asymptotic and finite-sample accuracy guarantees using random set theory. This analysis informs algorithmic design to obtain an $\epsilon$-close reachable set approximation with high probability, provides insights into which reachability problems are most challenging, and motivates safety-critical applications of the technique. On a neural network verification task, we show that this approach is more accurate and significantly faster than prior work. Informed by our analysis, we also design a robust model predictive controller that we demonstrate in hardware experiments.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?. (arXiv:2202.05821v2 [cs.LG] UPDATED)
    This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities, among video, kinematic, and segmentation data, in order to study their added value. The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.
    Adversarial Parameter Defense by Multi-Step Risk Minimization. (arXiv:2109.02889v2 [cs.LG] UPDATED)
    Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.
    The Pseudo Projection Operator: Applications of Deep Learning to Projection Based Filtering in Non-Trivial Frequency Regimes. (arXiv:2111.07140v3 [eess.SP] UPDATED)
    Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present. However, this technique relies on a priori knowledge of what frequencies contain signal and noise and that these frequencies do not overlap, which is difficult to achieve in practice. To address these issues, we introduce a PO-neural network hybrid model, the Pseudo Projection Operator (PPO), which leverages a neural network to perform frequency selection. We compare the filtering capabilities of a PPO, PO, and denoising autoencoder (DAE) on the University of Rochester Multi-Modal Music Performance Dataset with a variety of added noise types. In the majority of experiments, the PPO outperforms both the PO and DAE. Based upon these results, we suggest future application of the PPO to filtering problems in the physical and biological sciences.
    Planting Undetectable Backdoors in Machine Learning Models. (arXiv:2204.06974v1 [cs.LG])
    Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
    Your fairness may vary: Pretrained language model fairness in toxic text classification. (arXiv:2108.01250v3 [cs.CL] UPDATED)
    The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models. Warning: This paper contains samples of offensive text.
    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning. (arXiv:2106.06047v2 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.
    LDPC codes: tracking non-stationary channel noise using sequential variational Bayesian estimates. (arXiv:2204.07037v1 [eess.SP])
    We present a sequential Bayesian learning method for tracking non-stationary signal-to-noise ratios in LDPC codes using probabilistic graphical models. We represent the LDPC code as a cluster graph using a general purpose cluster graph construction algorithm called the layered trees running intersection property (LTRIP) algorithm. The channel noise estimator is a global Gamma cluster, which we extend to allow for Bayesian tracking of non-stationary noise variation. We evaluate our proposed model on real-world 5G drive test data. Our results show that our model is capable of tracking non-stationary channel noise, which outperforms an LDPC code with a fixed knowledge of the actual average channel noise.
    Exploring Dual Encoder Architectures for Question Answering. (arXiv:2204.07120v1 [cs.CL])
    Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore the dual encoder architectures for QA retrieval tasks. By evaluating on MS MARCO and the MultiReQA benchmark, we show that SDE performs significantly better than ADE. We further propose three different improved versions of ADEs. Based on the evaluation of QA retrieval tasks and direct analysis of the embeddings, we demonstrate that sharing parameters in projection layers would enable ADEs to perform competitively with SDEs.
    Character-focused Video Thumbnail Retrieval. (arXiv:2204.06563v1 [cs.CV])
    We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.
    CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing. (arXiv:2204.06625v1 [cs.CL])
    Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.
    A Unified Analysis of Dynamic Interactive Learning. (arXiv:2204.07071v1 [cs.LG])
    In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the models analyzed by [Emamjomeh-Zadeh et al., 2020], which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Sketching Algorithms and Lower Bounds for Ridge Regression. (arXiv:2204.06653v1 [cs.DS])
    We give a sketching-based iterative algorithm that computes $1+\varepsilon$ approximate solutions for the ridge regression problem $\min_x \|{Ax-b}\|_2^2 +\lambda\|{x}\|_2^2$ where $A \in \mathbb{R}^{n \times d}$ with $d \ge n$. Our algorithm, for a constant number of iterations (requiring a constant number of passes over the input), improves upon earlier work of Chowdhury et al., by requiring that the sketching matrix only has a weaker Approximate Matrix Multiplication (AMM) guarantee that depends on $\epsilon$, along with a constant subspace embedding guarantee. The earlier work instead requires that the sketching matrix have a subspace embedding guarantee that depends on $\epsilon$. For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al., with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \|{A}\|_2$ is the spectral norm of the matrix $A$. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression. Finally, we show that the sketch size required for our algorithm is essentially optimal for a natural framework of algorithms for ridge regression by proving lower bounds on oblivious sketching matrices for AMM. The sketch size lower bounds for AMM may be of independent interest.
    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification. (arXiv:2204.07030v1 [cs.CV])
    Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire continental United States, providing macroscopic insight into the effects of geography on crop classification in multi-spectral and temporally distributed satellite imagery. Our method demonstrates improved generalisability from 1) passing geographically correlated climate variables along with the satellite data to a Transformer model and 2) regressing on the model features to reconstruct these domain variables. Combined, we provide a novel perspective on geographic generalisation in satellite imagery and a simple-yet-effective approach to leverage domain knowledge. Code is available at: \url{https://github.com/samar-khanna/cropmap}
    Any-resolution Training for High-resolution Image Synthesis. (arXiv:2204.07156v1 [cs.CV])
    Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away, and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. Taking advantage of this data is challenging; high-resolution processing is costly, and current architectures can only process fixed-resolution data. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolutions images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, which also allows for scalable training at higher resolutions. Controlled FFHQ experiments show our method takes advantage of the multi-resolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments. Our project page is available at: https://chail.github.io/anyres-gan/.
    Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model. (arXiv:2108.07467v2 [cs.SD] UPDATED)
    Abdominal auscultation is a convenient, safe and inexpensive method to assess bowel conditions, which is essential in neonatal care. It helps early detection of neonatal bowel dysfunctions and allows timely intervention. This paper presents a neonatal bowel sound detection method to assist the auscultation. Specifically, a Convolutional Neural Network (CNN) is proposed to classify peristalsis and non-peristalsis sounds. The classification is then optimized using a Laplace Hidden Semi-Markov Model (HSMM). The proposed method is validated on abdominal sounds from 49 newborn infants admitted to our tertiary Neonatal Intensive Care Unit (NICU). The results show that the method can effectively detect bowel sounds with accuracy and area under curve (AUC) score being 89.81% and 83.96% respectively, outperforming 13 baseline methods. Furthermore, the proposed Laplace HSMM refinement strategy is proven capable to enhance other bowel sound detection models. The outcomes of this work have the potential to facilitate future telehealth applications for neonatal care. The source code of our work can be found at: https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.
    Masked Siamese Networks for Label-Efficient Learning. (arXiv:2204.07141v1 [cs.LG])
    We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
    Activation Map Adaptation for Effective Knowledge Distillation. (arXiv:2010.13500v2 [cs.CV] UPDATED)
    Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
    Shedding New Light on the Language of the Dark Web. (arXiv:2204.06885v1 [cs.CL])
    The hidden nature and the limited accessibility of the Dark Web, combined with the lack of public datasets in this domain, make it difficult to study its inherent characteristics such as linguistic properties. Previous works on text classification of Dark Web domain have suggested that the use of deep neural models may be ineffective, potentially due to the linguistic differences between the Dark and Surface Webs. However, not much work has been done to uncover the linguistic characteristics of the Dark Web. This paper introduces CoDA, a publicly available Dark Web dataset consisting of 10000 web documents tailored towards text-based Dark Web analysis. By leveraging CoDA, we conduct a thorough linguistic analysis of the Dark Web and examine the textual differences between the Dark Web and the Surface Web. We also assess the performance of various methods of Dark Web page classification. Finally, we compare CoDA with an existing public Dark Web dataset and evaluate their suitability for various use cases.
    Estimating Structural Disparities for Face Models. (arXiv:2204.06562v1 [cs.CV])
    In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations (groups) of datapoints. Thus, the inputs to disparity quantification consist of a model's predictions $\hat{y}$, the ground-truth labels for the predictions $y$, and group labels $g$ for the data points. Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated. In many real world scenarios however, group labels ($g$) may not be available at scale during training and validation time, or collecting them might not be feasible or desirable as they could often be sensitive information. As a result, evaluating disparity metrics across categorical groups would not be feasible. On the other hand, in many scenarios noisy groupings may be obtainable using some form of a proxy, which would allow measuring disparity metrics across sub-populations. Here we explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation. Our experiments indicate that embeddings resulting from an off-the-shelf face recognition model, could meaningfully serve as a proxy for such estimation.
    ExPLoit: Extracting Private Labels in Split Learning. (arXiv:2112.01299v2 [cs.CR] UPDATED)
    Split learning is a popular technique used for vertical federated learning (VFL), where the goal is to jointly train a model on the private input and label data held by two parties. This technique uses a split-model, trained end-to-end, by exchanging the intermediate representations (IR) of the inputs and gradients of the IR between the two parties. We propose ExPLoit - a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning. ExPLoit frames the attack as a supervised learning problem by using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models. Our evaluations show that ExPLoit can uncover the private labels with near-perfect accuracy of up to 99.96%. Our findings underscore the need for better training techniques for VFL.
    A Study of Causal Confusion in Preference-Based Reward Learning. (arXiv:2204.06601v1 [cs.LG])
    Learning robot policies via preference-based reward learning is an increasingly popular method for customizing robot behavior. However, in recent years, there has been a growing body of anecdotal evidence that learning reward functions from preferences is prone to spurious correlations and reward gaming or hacking behaviors. While there is much anecdotal, empirical, and theoretical analysis of causal confusion and reward gaming behaviors both in reinforcement learning and imitation learning approaches that directly map from states to actions, we provide the first systematic study of causal confusion in the context of learning reward functions from preferences. To facilitate this study, we identify a set of three preference learning benchmark domains where we observe causal confusion when learning from offline datasets of pairwise trajectory preferences: a simple reacher domain, an assistive feeding domain, and an itch-scratching domain. To gain insight into this observed causal confusion, we present a sensitivity analysis that explores the effect of different factors--including the type of training data, reward model capacity, and feature dimensionality--on the robustness of rewards learned from preferences. We find evidence that learning rewards from pairwise trajectory preferences is highly sensitive and non-robust to spurious features and increasing model capacity, but not as sensitive to the type of training data. Videos, code, and supplemental results are available at https://sites.google.com/view/causal-reward-confusion.
    deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks. (arXiv:2204.06815v1 [cs.LG])
    A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.
    OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting. (arXiv:2104.07963v3 [eess.SP] UPDATED)
    Many applications require accurate indoor localization. Fingerprint-based localization methods propose a solution to this problem, but rely on a radio map that is effort-intensive to acquire. We automate the radio map acquisition phase using a software-defined radio (SDR) and a wheeled robot. Furthermore, we open-source a radio map acquired with our automated tool for a 3GPP Long-Term Evolution (LTE) wireless link. To the best of our knowledge, this is the first publicly available radio map containing channel state information (CSI). Finally, we describe first localization experiments on this radio map using a convolutional neural network to regress for location coordinates.
    Tight Bounds for Quantum State Certification with Incoherent Measurements. (arXiv:2204.07155v1 [quant-ph])
    We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma \in \mathbb{C}^{d \times d}$, $n$ copies of a mixed state $\rho \in \mathbb{C}^{d \times d}$, and $\varepsilon > 0$, and we are asked to determine whether $\rho = \sigma$ or whether $\| \rho - \sigma \|_1 > \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d} I_d$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i.e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2} / \varepsilon^2)$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega (d^{4/3} / \varepsilon^2)$ [Bubeck-Chen-Li '20], and it has been an outstanding open problem to close this polynomial gap. In this work, 1) we settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega (d^{3/2} / \varepsilon^2)$ copies are necessary, and 2) we show the instance-optimal bounds for state certification to general $\sigma$ first derived by [Chen-Li-O'Donnell '21] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding certain matrix martingales, which we believe may be of independent interest.
    CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations. (arXiv:2204.07142v1 [cs.CL])
    Supervised learning has traditionally focused on inductive learning by observing labeled examples of a task. In contrast, humans have the ability to learn new concepts from language. Here, we explore training zero-shot classifiers for structured data purely from language. For this, we introduce CLUES, a benchmark for Classifier Learning Using natural language ExplanationS, consisting of a range of classification tasks over structured data along with natural language supervision in the form of explanations. CLUES consists of 36 real-world and 144 synthetic classification tasks. It contains crowdsourced explanations describing real-world tasks from multiple teachers and programmatically generated explanations for the synthetic tasks. To model the influence of explanations in classifying an example, we develop ExEnt, an entailment-based model that learns classifiers using explanations. ExEnt generalizes up to 18% better (relative) on novel tasks than a baseline that does not use explanations. We delineate key challenges for automated learning from explanations, addressing which can lead to progress on CLUES in the future. Code and datasets are available at: https://clues-benchmark.github.io.
    Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. (arXiv:2109.07259v2 [nlin.AO] UPDATED)
    Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and decision-making either lacks a systematic way to describe this source of uncertainty or puts the focus on obtaining optimal policies using complex models of the world that would impose an unrealistically high cognitive demand on real agents. In this work we aim to efficiently describe the emergent behavior of biologically plausible and parsimonious learning agents faced with partially observable worlds. Therefore we derive and present deterministic reinforcement learning dynamics where the agents observe the true state of the environment only partially. We showcase the broad applicability of our dynamics across different classes of partially observable agent-environment systems. We find that partial observability creates unintuitive benefits in a number of specific contexts, pointing the way to further research on a general understanding of such effects. For instance, partially observant agents can learn better outcomes faster, in a more stable way and even overcome social dilemmas. Furthermore, our method allows the application of dynamical systems theory to partially observable multiagent leaning. In this regard we find the emergence of catastrophic limit cycles, a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions, all caused by partial observability. Therefore, the presented dynamics have the potential to become a formal, yet practical, lightweight and robust tool for researchers in biology, social science and machine learning to systematically investigate the effects of interacting partially observant agents.
    Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission. (arXiv:2204.06766v1 [cs.LG])
    Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.
    EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting. (arXiv:2204.07066v1 [cs.NE])
    In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with the same initial weights are spawned. Each child undergoes a training step and adjusts their weights on the same data. Due to stochastic back-propagation, the set of children has a variety of weights with different levels of performance. The weights that best minimize the reconstruction loss with a given signal dictionary are passed to the next generation. The predictions from the best-performing weights of the first and last generation are compared. We found improvements while comparing the weights of these two generations. However, due to several confounding parameters and hyperparameter limitations, some of the weights had negligible improvements. To the best of our knowledge, this is the first attempt to use sparse coding in this way to optimize time series forecasting model weights, such as those of an LSTM network.
    A deep learning algorithm for reducing false positives in screening mammography. (arXiv:2204.06671v1 [cs.CV])
    Screening mammography improves breast cancer outcomes by enabling early detection and treatment. However, false positive callbacks for additional imaging from screening exams cause unnecessary procedures, patient anxiety, and financial burden. This work demonstrates an AI algorithm that reduces false positives by identifying mammograms not suspicious for breast cancer. We trained the algorithm to determine the absence of cancer using 123,248 2D digital mammograms (6,161 cancers) and performed a retrospective study on 14,831 screening exams (1,026 cancers) from 15 US and 3 UK sites. Retrospective evaluation of the algorithm on the largest of the US sites (11,592 mammograms, 101 cancers) a) left the cancer detection rate unaffected (p=0.02, non-inferiority margin 0.25 cancers per 1000 exams), b) reduced callbacks for diagnostic exams by 31.1% compared to standard clinical readings, c) reduced benign needle biopsies by 7.4%, and d) reduced screening exams requiring radiologist interpretation by 41.6% in the simulated clinical workflow. This work lays the foundation for semi-autonomous breast cancer screening systems that could benefit patients and healthcare systems by reducing false positives, unnecessary procedures, patient anxiety, and expenses.
    SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. (arXiv:2204.06699v1 [cs.LG])
    Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.
    Twitter User Representation Using Weakly Supervised Graph Embedding. (arXiv:2108.08988v3 [cs.CL] UPDATED)
    Social media platforms provide convenient means for users to participate in multiple online activities on various contents and create fast widespread interactions. However, this rapidly growing access has also increased the diverse information, and characterizing user types to understand people's lifestyle decisions shared in social media is challenging. In this paper, we propose a weakly supervised graph embedding based framework for understanding user types. We evaluate the user embedding learned using weak supervision over well-being related tweets from Twitter, focusing on 'Yoga', 'Keto diet'. Experiments on real-world datasets demonstrate that the proposed framework outperforms the baselines for detecting user types. Finally, we illustrate data analysis on different types of users (e.g., practitioner vs. promotional) from our dataset. While we focus on lifestyle-related tweets (i.e., yoga, keto), our method for constructing user representation readily generalizes to other domains.
    ICSML: Industrial Control Systems Machine Learning Inference Framework natively executing on IEC 61131-3 compliant devices. (arXiv:2202.10075v2 [cs.LG] UPDATED)
    Industrial Control Systems (ICS) have played a catalytic role in enabling the 4th Industrial Revolution. ICS devices like Programmable Logic Controllers (PLCs), automate, monitor, and control critical processes in industrial, energy, and commercial environments. The convergence of traditional Operational Technology (OT) with Information Technology (IT) has opened a new and unique threat landscape. This has inspired defense research that focuses heavily on Machine Learning (ML) based anomaly detection methods that run on external IT hardware, which means an increase in costs and the further expansion of the threat landscape. To remove this requirement, we introduce the ICS machine learning inference framework (ICSML) which enables the execution of ML model inference natively on the PLC. ICSML is implemented in IEC 61131-3 code and provides several optimizations to bypass the limitations imposed by the domain-specific languages. Therefore, it works \emph{on every PLC without the need for vendor support}. ICSML provides a complete set of components for the creation of full ML models similarly to established ML frameworks. We run a series of benchmarks studying memory and performance and compare our solution to the TFLite inference framework. At the same time, we develop domain-specific model optimizations to improve the efficiency of ICSML. To demonstrate the abilities of ICSML, we evaluate a case study of a real defense for process-aware attacks targeting a desalination plant.
    Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach. (arXiv:2204.06910v1 [cs.NI])
    In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).
    Surface Similarity Parameter: A New Machine Learning Loss Metric for Oscillatory Spatio-Temporal Data. (arXiv:2204.06843v1 [cs.LG])
    Supervised machine learning approaches require the formulation of a loss functional to be minimized in the training phase. Sequential data are ubiquitous across many fields of research, and are often treated with Euclidean distance-based loss functions that were designed for tabular data. For smooth oscillatory data, those conventional approaches lack the ability to penalize amplitude, frequency and phase prediction errors at the same time, and tend to be biased towards amplitude errors. We introduce the surface similarity parameter (SSP) as a novel loss function that is especially useful for training machine learning models on smooth oscillatory sequences. Our extensive experiments on chaotic spatio-temporal dynamical systems indicate that the SSP is beneficial for shaping gradients, thereby accelerating the training process, reducing the final prediction error, and implementing a stronger regularization effect compared to using classical loss functions. The results indicate the potential of the novel loss metric particularly for highly complex and chaotic data, such as data stemming from the nonlinear two-dimensional Kuramoto-Sivashinsky equation and the linear propagation of dispersive surface gravity waves in fluids.
    Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation. (arXiv:2204.07028v1 [cs.LG])
    Federated learning (FL) is a distributed machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. User-constrained communication bandwidth and the requirement for personalized models pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable. Proxy-data-free FD approaches eliminate the need for additional public data beyond clients' private data, but suffer from remarkable discrepancy among local knowledge due to model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC method outperforms the state-of-the-art in 93.33% of comparisons, and achieves faster convergence without increasing communication overhead.
    Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. (arXiv:2204.05030v2 [cs.AI] UPDATED)
    This paper contributes with a pragmatic evaluation framework for explainable Machine Learning (ML) models for clinical decision support. The study revealed a more nuanced role for ML explanation models, when these are pragmatically embedded in the clinical context. Despite the general positive attitude of healthcare professionals (HCPs) towards explanations as a safety and trust mechanism, for a significant set of participants there were negative effects associated with confirmation bias, accentuating model over-reliance and increased effort to interact with the model. Also, contradicting one of its main intended functions, standard explanatory models showed limited ability to support a critical understanding of the limitations of the model. However, we found new significant positive effects which repositions the role of explanations within a clinical context: these include reduction of automation bias, addressing ambiguous clinical cases (cases where HCPs were not certain about their decision) and support of less experienced HCPs in the acquisition of new domain knowledge.
    The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark. (arXiv:2204.06972v1 [cs.CV])
    We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE by 8.2% and the MAE by 7.7%. The dataset is available at: https://humaticslab.github.io/forecasting/visuelle.
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.
    HCR-Net: A deep learning based script independent handwritten character recognition network. (arXiv:2108.06663v2 [cs.CV] UPDATED)
    Despite being studied extensively for a few decades, handwritten character recognition (HCR) is considered a challenging learning problem in pattern recognition and there is very limited research on script independent models. This is mainly because of diversity of scripts, focus of the conventional research on handcrafted feature extraction techniques, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning but it has been studied for specific scripts only. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. HCR-Net is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained network are utilized. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalizations, and can achieve up to 99\% results of its final accuracy in just first epoch. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at https://github.com/jmdvinodjmd/HCR-Net.
    A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming. (arXiv:2110.03894v2 [eess.AS] UPDATED)
    In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.
    Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses. (arXiv:2106.08746v3 [cs.LG] UPDATED)
    Recent work has shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial perturbations. Adversaries can mislead policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that using the Universal Adversarial Perturbation (UAP) method to compute perturbations, independent of the individual inputs to which they are applied to, can fool DRL policies effectively and in real time. We describe three such attack variants. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster compared to the response time (0.6ms on average) of different DRL policies, and considerably faster than prior attacks using adversarial perturbations (1.8ms on average). We also show that our attack technique is efficient, incurring an online computational cost of 0.027ms on average. Using two further tasks involving robotic movement, we confirm that our results generalize to more complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We propose an effective technique that detects all known adversarial perturbations against DRL policies, including all the universal perturbations presented in this paper.
    Extracting Finite Automata from RNNs Using State Merging. (arXiv:2201.12451v3 [cs.LG] UPDATED)
    One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our method on the Tomita languages benchmark, where we find that it is able to extract faithful automata from RNNs trained on all languages in the benchmark. We find that extraction performance is aided by the number of data provided during the extraction process, as well as, curiously, whether the RNN model is trained for additional epochs after perfectly learning its target language. We use our method to analyze this phenomenon, finding that training beyond convergence is useful because it leads to compression of the internal state space of the RNN. This finding demonstrates how our method can be used for interpretability and analysis of trained RNN models.
    SkillNet: A Sparsely Activated Model for General-Purpose Natural Language Understanding. (arXiv:2203.03312v2 [cs.CL] UPDATED)
    Prevailing deep models are single-purpose and overspecialize at individual tasks. However, when being extended to new tasks, they typically forget previously learned skills and learn from scratch. We address this issue by introducing SkillNet, a general-purpose model that stitches together existing skills to learn new tasks more effectively. The key feature of our approach is that it is sparsely activated guided by predefined skills. Different from traditional dense models that always activate all the model parameters, SkillNet only activates parts of the model parameters whose skills are relevant to the target task. When learning for a new task, our approach precisely activates required skills and also provides an option to add new skills. We evaluate on natural language understandings tasks and have the following findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.
    Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model. (arXiv:2202.06257v2 [cs.LG] UPDATED)
    Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.
    DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model. (arXiv:2112.14798v3 [physics.comp-ph] UPDATED)
    A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newtonian hydrodynamic model, has been proposed and has shown some success in systematically passing the micro-scale structural mechanics information to the macro-scale hydrodynamics for suspensions with simple polymer conformation and bond potential. The model retains a multi-scaled nature by mapping the polymer configurations into a set of symmetry-preserving macro-scale features. The extended constitutive laws for these macro-scale features can be directly learned from the kinetics of their micro-scale counterparts. In this paper, we develop DeePN$^2$ using more complex micro-structural models. We show that DeePN$^2$ can faithfully capture the broadly overlooked viscoelastic differences arising from the specific molecular structural mechanics without human intervention.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v1 [cs.LG])
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    LEFM-Nets: Learnable Explicit Feature Map Deep Networks for Segmentation of Histopathological Images of Frozen Sections. (arXiv:2204.06955v1 [eess.IV])
    Accurate segmentation of medical images is essential for diagnosis and treatment of diseases. These problems are solved by highly complex models, such as deep networks (DN), requiring a large amount of labeled data for training. Thereby, many DNs possess task- or imaging modality specific architectures with a decision-making process that is often hard to explain and interpret. Here, we propose a framework that embeds existing DNs into a low-dimensional subspace induced by the learnable explicit feature map (LEFM) layer. Compared to the existing DN, the framework adds one hyperparameter and only modestly increase the number of learnable parameters. The method is aimed at, but not limited to, segmentation of low-dimensional medical images, such as color histopathological images of stained frozen sections. Since features in the LEFM layer are polynomial functions of the original features, proposed LEFM-Nets contribute to the interpretability of network decisions. In this work, we combined LEFM with the known networks: DeepLabv3+, UNet, UNet++ and MA-net. New LEFM-Nets are applied to the segmentation of adenocarcinoma of a colon in a liver from images of hematoxylin and eosin (H&E) stained frozen sections. LEFM-Nets are also tested on nuclei segmentation from images of H&E stained frozen sections of ten human organs. On the first problem, LEFM-Nets achieved statistically significant performance improvement in terms of micro balanced accuracy and $F_1$ score than original networks. LEFM-Nets achieved only better performance in comparison with the original networks on the second problem. The source code is available at https://github.com/dsitnik/lefm.
    A Level Set Theory for Neural Implicit Evolution under Explicit Flows. (arXiv:2204.07159v1 [cs.CV])
    Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy-minimization problems that induce an instantaneous flow field on the explicit surface. Our method uses the flow field to deform parametric implicit surfaces by extending the classical theory of level sets. We also derive a consolidated view for existing methods on differentiable surface extraction and rendering, by formalizing connections to the level-set theory. We show that these methods drift from the theory and that our approach exhibits improvements for applications like surface smoothing, mean-curvature flow, inverse rendering and user-defined editing on implicit geometry.
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals. (arXiv:2204.06644v1 [cs.LG])
    We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.
    Performance Assessment of different Machine Learning Algorithm for Life-Time Prediction of Solder Joints based on Synthetic Data. (arXiv:2204.06627v1 [cs.LG])
    This paper proposes a computationally efficient methodology to predict the damage progression in solder contacts of electronic components using temperature-time curves. For this purpose, two machine learning algorithms, a Multilayer Perceptron and a Long Short-Term Memory network, are trained and compared with respect to their prediction accuracy and the required amount of training data. The training is performed using synthetic, normally distributed data that is realistic for automotive applications. A finite element model of a simple bipolar chip resistor in surface mount technology configuration is used to numerically compute the synthetic data. As a result, both machine learning algorithms show a relevant accuracy for the prediction of accumulated creep strains. With a training data length of 350 hours (12.5% of the available training data), both models show a constantly good fitting performance of $R^2$ of 0.72 for the Multilayer Perceptron and $R^2$ of 0.87 for the Long Short-Term Memory network. The prediction errors of the accumulated creep strains are less than 10% with an amount of 350 hours training data and decreases to less than 5 % when using further data. Therefore, both approaches are promising for the lifetime prediction directly on the electronic device.
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.
    Question rewriting? Assessing its importance for conversational question answering. (arXiv:2201.09146v2 [cs.CL] UPDATED)
    In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data. (arXiv:2204.06701v1 [cs.LG])
    Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2 time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.
    BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. (arXiv:2201.02693v2 [cs.LG] UPDATED)
    Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. We publish the code repository for reproducibility of the results in this study.
    A Natural Language Processing Approach for Instruction Set Architecture Identification. (arXiv:2204.06624v1 [cs.CR])
    Binary analysis of software is a critical step in cyber forensics applications such as program vulnerability assessment and malware detection. This involves interpreting instructions executed by software and often necessitates converting the software's binary file data to assembly language. The conversion process requires information about the binary file's target instruction set architecture (ISA). However, ISA information might not be included in binary files due to compilation errors, partial downloads, or adversarial corruption of file metadata. Machine learning (ML) is a promising methodology that can be used to identify the target ISA using binary data in the object code section of binary files. In this paper we propose a binary code feature extraction model to improve the accuracy and scalability of ML-based ISA identification methods. Our feature extraction model can be used in the absence of domain knowledge about the ISAs. Specifically, we adapt models from natural language processing (NLP) to i) identify successive byte patterns commonly observed in binary codes, ii) estimate the significance of each byte pattern to a binary file, and iii) estimate the relevance of each byte pattern in distinguishing between ISAs. We introduce character-level features of encoded binaries to identify fine-grained bit patterns inherent to each ISA. We use a dataset with binaries from 12 different ISAs to evaluate our approach. Empirical evaluations show that using our byte-level features in ML-based ISA identification results in an 8% higher accuracy than the state-of-the-art features based on byte-histograms and byte pattern signatures. We observe that character-level features allow reducing the size of the feature set by up to 16x while maintaining accuracy above 97%.
    Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression. (arXiv:2204.06787v1 [cs.LG])
    Traditional one-bit compressed stochastic gradient descent can not be directly employed in multi-hop all-reduce, a widely adopted distributed training paradigm in network-intensive high-performance computing systems such as public clouds. According to our theoretical findings, due to the cascading compression, the training process has considerable deterioration on the convergence performance. To overcome this limitation, we implement a sign-bit compression-based learning synchronization framework, Marsit. It prevents cascading compression via an elaborate bit-wise operation for unbiased sign aggregation and its specific global compensation mechanism for mitigating compression deviation. The proposed framework retains the same theoretical convergence rate as non-compression mechanisms. Experimental results demonstrate that Marsit reduces up to 35% training time while preserving the same accuracy as training without compression.
    Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning. (arXiv:2204.06904v1 [quant-ph])
    Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.
    MIMO Channel Estimation using Score-Based Generative Models. (arXiv:2204.07122v1 [eess.SP])
    Channel estimation is a critical task in multiple-input multiple-output digital communications that has effects on end-to-end system performance. In this work, we introduce a novel approach for channel estimation using deep score-based generative models. These models are trained to estimate the gradient of the log-prior distribution, and can be used to iteratively refine estimates, given observed measurements of a signal. We introduce a framework for training score-based generative models for wireless channels, as well as performing channel estimation using posterior sampling at test time. We derive theoretical robustness guarantees of channel estimation with posterior sampling in single-input single-output scenarios, and show that the observations regarding estimation performance are verified experimentally in MIMO channels. Our results in simulated clustered delay line channels show competitive in-distribution performance without error floors in the high signal-to-noise ratio regime, and robust out-of-distribution performance, outperforming competing deep learning methods by up to 5 dB in end-to-end communication performance, while the complexity analysis reveals how model architecture can efficiently trade performance for estimation latency.
    data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. (arXiv:2202.03555v2 [cs.LG] UPDATED)
    While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
    From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks. (arXiv:2204.07018v1 [cs.SD])
    This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack algorithms on the balance of average budgets allocated by the adversary and the attack cost. Moreover, our experimental results have shown that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary than other 2D representations. We also report some results on different convolutional neural network architectures such as ResNet-34, ResNet-56, AlexNet, and GoogLeNet, SB-CNN, and LSTM-based.  ( 2 min )
    Learning Convolutional Neural Networks in Frequency Domain. (arXiv:2204.06718v1 [cs.CV])
    Convolutional neural network (CNN) achieves impressive success in the field of computer vision during the past few decades. As the core of CNNs, image convolution operation helps CNNs to achieve good performance on image-related tasks. However, image convolution is hard to be implemented and parallelized. In this paper, we propose a novel neural network model, namely CEMNet, that can be trained in frequency domain. The most important motivation of this research is that we can use the very simple element-wise multiplication operation to replace the image convolution in frequency domain based on Cross-Correlation Theorem. We further introduce Weight Fixation Mechanism to alleviate over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU and Dropout in frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by DFT, we design two branch network structure for CEMNet. Experimental results imply that CEMNet works well in frequency domain, and achieve good performance on MNIST and CIFAR-10 databases. To our knowledge, CEMNet is the first model trained in Fourier Domain that achieves more than 70\% validation accuracy on CIFAR-10 database.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Joint Coreset Construction and Quantization for Distributed Machine Learning. (arXiv:2204.06652v1 [cs.LG])
    Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between ML error bounds and costs, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error bounds caused by a combination of coreset construction and quantization. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budget to the nodes while minimizing the overall ML error. Through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness and efficiency of our proposed algorithms for a variety of ML tasks. In particular, our algorithms have achieved more than 90% data reduction with less than 10% degradation in ML performance in most cases.  ( 2 min )
    Control-oriented meta-learning. (arXiv:2204.06716v1 [cs.RO])
    Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully-actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.  ( 2 min )
    Leveraging convergence behavior to balance conflicting tasks in multi-task learning. (arXiv:2204.06698v1 [cs.LG])
    Multi-Task Learning is a learning paradigm that uses correlated tasks to improve performance generalization. A common way to learn multiple tasks is through the hard parameter sharing approach, in which a single architecture is used to share the same subset of parameters, creating an inductive bias between them during the training process. Due to its simplicity, potential to improve generalization, and reduce computational cost, it has gained the attention of the scientific and industrial communities. However, tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined to allow simultaneous learning. To address this problem, we use the idea of multi-objective optimization to propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation. The result of this method is to give more attention to the tasks that are diverging or that are not being benefited during the last iterations, allowing to ensure that the simultaneous learning is heading to the performance maximization of all tasks. As a result, we empirically show that the proposed method outperforms the state-of-art approaches on learning conflicting tasks. Unlike the adopted baselines, our method ensures that all tasks reach good generalization performances.  ( 2 min )
    Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. (arXiv:2204.06684v1 [physics.comp-ph])
    Deep neural operators can learn operators mapping between infinite-dimensional function spaces via deep neural networks and have become an emerging paradigm of scientific machine learning. However, training neural operators usually requires a large amount of high-fidelity data, which is often difficult to obtain in real engineering problems. Here, we address this challenge by using multifidelity learning, i.e., learning from multifidelity datasets. We develop a multifidelity neural operator based on a deep operator network (DeepONet). A multifidelity DeepONet includes two standard DeepONets coupled by residual learning and input augmentation. Multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data. We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport. By combining a trained multifidelity DeepONet with genetic algorithm or topology optimization, we demonstrate a fast solver for the inverse design of BTE problems.  ( 2 min )
    Leveraging Natural Learning Processing to Uncover Themes in Clinical Notes of Patients Admitted for Heart Failure. (arXiv:2204.07074v1 [cs.LG])
    Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.  ( 2 min )
    The Vision of Self-Evolving Computing Systems. (arXiv:2204.06825v1 [cs.SE])
    Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing system detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.  ( 2 min )
    Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. (arXiv:2204.07049v1 [cs.RO])
    In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.  ( 2 min )
    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data. (arXiv:2101.11442v2 [physics.med-ph] UPDATED)
    Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cram\'er-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.  ( 2 min )
    Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity. (arXiv:2204.06618v1 [cs.CC])
    This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.  ( 2 min )
    Network state Estimation using Raw Video Analysis: vQoS-GAN based non-intrusive Deep Learning Approach. (arXiv:2204.07062v1 [cs.MM])
    Content based providers transmits real time complex signal such as video data from one region to another. During this transmission process, the signals usually end up distorted or degraded where the actual information present in the video is lost. This normally happens in the streaming video services applications. Hence there is a need to know the level of degradation that happened in the receiver side. This video degradation can be estimated by network state parameters like data rate and packet loss values. Our proposed solution vQoS GAN (video Quality of Service Generative Adversarial Network) can estimate the network state parameters from the degraded received video data using a deep learning approach of semi supervised generative adversarial network algorithm. A robust and unique design of deep learning network model has been trained with the video data along with data rate and packet loss class labels and achieves over 95 percent of training accuracy. The proposed semi supervised generative adversarial network can additionally reconstruct the degraded video data to its original form for a better end user experience.  ( 2 min )
    Clifford Circuits can be Properly PAC Learned if and only if $\textsf{RP}=\textsf{NP}$. (arXiv:2204.06638v1 [quant-ph])
    Given a dataset of input states, measurements, and probabilities, is it possible to efficiently predict the measurement probabilities associated with a quantum circuit? Recent work of Caro and Datta (2020) studied the problem of PAC learning quantum circuits in an information theoretic sense, leaving open questions of computational efficiency. In particular, one candidate class of circuits for which an efficient learner might have been possible was that of Clifford circuits, since the corresponding set of states generated by such circuits, called stabilizer states, are known to be efficiently PAC learnable (Rocchetto 2018). Here we provide a negative result, showing that proper learning of CNOT circuits is hard for classical learners unless $\textsf{RP} = \textsf{NP}$. As the classical analogue and subset of Clifford circuits, this naturally leads to a hardness result for Clifford circuits as well. Additionally, we show that if $\textsf{RP} = \textsf{NP}$ then there would exist efficient proper learning algorithms for CNOT and Clifford circuits. By similar arguments, we also find that an efficient proper quantum learner for such circuits exists if and only if $\textsf{NP} \subseteq \textsf{RQP}$.  ( 2 min )
    EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. (arXiv:2204.06947v1 [cs.LG])
    In recent years, neural networks and especially deep architectures have received substantial attention for EEG signal analysis in the field of brain-computer interfaces (BCIs). In this ongoing research area, the end-to-end models are more favoured than traditional approaches requiring signal transformation pre-classification. They can eliminate the need for prior information from experts and the extraction of handcrafted features. However, although several deep learning algorithms have been already proposed in the literature, achieving high accuracies for classifying motor movements or mental tasks, they often face a lack of interpretability and therefore are not quite favoured by the neuroscience community. The reasons behind this issue can be the high number of parameters and the sensitivity of deep neural networks to capture tiny yet unrelated discriminative features. We propose an end-to-end deep learning architecture called EEG-ITNet and a more comprehensible method to visualise the network learned patterns. Using inception modules and causal convolutions with dilation, our model can extract rich spectral, spatial, and temporal information from multi-channel EEG signals with less complexity (in terms of the number of trainable parameters) than other existing end-to-end architectures, such as EEG-Inception and EEG-TCNet. By an exhaustive evaluation on dataset 2a from BCI competition IV and OpenBMI motor imagery dataset, EEG-ITNet shows up to 5.9\% improvement in the classification accuracy in different scenarios with statistical significance compared to its competitors. We also comprehensively explain and support the validity of network illustration from a neuroscientific perspective. We have also made our code open at https://github.com/AbbasSalami/EEG-ITNet  ( 2 min )
    Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis. (arXiv:2204.06929v1 [eess.IV])
    Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.  ( 2 min )
    Modularity benefits reinforcement learning agents with competing homeostatic drives. (arXiv:2204.06608v1 [cs.LG])
    The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.  ( 2 min )
    Learning Invariances with Generalised Input-Convex Neural Networks. (arXiv:2204.07009v1 [cs.LG])
    Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry.  ( 2 min )
    Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems. (arXiv:2204.07135v1 [cs.LG])
    Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.
    SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots. (arXiv:2011.06252v2 [cs.CV] UPDATED)
    This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and top-down learning within two separate branches of the network while sharing the same encoding layers. We design dedicated spatial attention modules (SAMs) along these learning pathways to exploit the coarse-level and fine-level semantic features for SOD at four stages of abstractions. The bottom-up branch performs a rough yet reasonably accurate saliency estimation at a fast rate, whereas the deeper top-down branch incorporates a residual refinement module (RRM) that provides fine-grained localization of the salient objects. Extensive performance evaluation of SVAM-Net on benchmark datasets clearly demonstrates its effectiveness for underwater SOD. We also validate its generalization performance by several ocean trials' data that include test images of diverse underwater scenes and waterbodies, and also images with unseen natural objects. Moreover, we analyze its computational feasibility for robotic deployments and demonstrate its utility in several important use cases of visual attention modeling.
    Ensemble learning using individual neonatal data for seizure detection. (arXiv:2204.07043v1 [eess.SP])
    Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The approach allows different detector architectures amongst the institutions. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best overall performance, it was only marginally outperformed by the Dawid-Skene method when local detectors approach performance of a single detector trained on all available data.
    Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks. (arXiv:2008.09777v4 [cs.LG] UPDATED)
    The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tabular NAS benchmarks has been a focus on extremely small architectural search spaces since their construction relies on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger spaces. To overcome this fundamental limitation, we propose a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. We exemplify this approach by creating surrogate NAS benchmarks on the existing tabular NAS-Bench-101 and on two widely used NAS search spaces with up to $10^{21}$ architectures ($10^{13}$ times larger than any previous tabular NAS benchmark). We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight. We open-source all our code and believe that surrogate NAS benchmarks are an indispensable tool to extend scientifically sound work on NAS to large and exciting search spaces.
  • Open

    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    Sparse Interaction Neighborhood Selection for Markov Random Fields via Reversible Jump and Pseudoposteriors. (arXiv:2204.05933v2 [stat.CO] UPDATED)
    We consider the problem of estimating the interacting neighborhood of a Markov Random Field model with finite support and homogeneous pairwise interactions based on relative positions of a two-dimensional lattice. Using a Bayesian framework, we propose a Reversible Jump Monte Carlo Markov Chain algorithm that jumps across subsets of a maximal range neighborhood, allowing us to perform model selection based on a marginal pseudoposterior distribution of models.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.  ( 2 min )
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.  ( 2 min )
    Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation. (arXiv:2203.15619v3 [cs.CV] UPDATED)
    In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.  ( 2 min )
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.  ( 2 min )
    Observable adjustments in single-index models for regularized M-estimators. (arXiv:2204.06990v1 [math.ST])
    We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.  ( 2 min )
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.  ( 2 min )
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.  ( 2 min )
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.  ( 2 min )
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.  ( 2 min )
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.  ( 2 min )
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.  ( 2 min )
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.  ( 2 min )
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.  ( 2 min )
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects. (arXiv:1910.01799v3 [stat.ME] UPDATED)
    We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.  ( 2 min )
    Using Machine Learning for Particle Identification in ALICE. (arXiv:2204.06900v1 [nucl-ex])
    Particle identification (PID) is one of the main strengths of the ALICE experiment at the LHC. It is a crucial ingredient for detailed studies of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. ALICE provides PID information via various experimental techniques, allowing for the identification of particles over a broad momentum range (from around 100 MeV/$c$ to around 50 GeV/$c$). The main challenge is how to combine the information from various detectors effectively. Therefore, PID represents a model classification problem, which can be addressed using Machine Learning (ML) solutions. Moreover, the complexity of the detector and richness of the detection techniques make PID an interesting area of research also for the computer science community. In this work, we show the current status of the ML approach to PID in ALICE. We discuss the preliminary work with the Random Forest approach for the LHC Run 2 and a more advanced solution based on Domain Adaptation Neural Networks, including a proposal for its future implementation within the ALICE computing software for the upcoming LHC Run 3.  ( 2 min )

  • Open

    [D] What is the difference between channel-wise and self attention in this case?
    Example: I fed 32 feature maps of dimension 6x6x32 into a Squeeze and Excitation layer, which assigns a weight to each of my channel through a channel-wise attention mechanism. What is the difference between passing these 32 feature maps into a Hybrid Transformer Encoder with patch of dimension 6x6? (So 1 patch for each channel) As I understand it, channel attention says "which channel is important for the final prediction". While transformer (with self attention) tells us "where to focus our attention in a given context". Isn't that the same if the patches are the channels? Basically it tells us on which patch to focus, and if patch=channel then squeeze excitation = self attention ? submitted by /u/Rogitus [link] [comments]  ( 1 min )
    [P] Bounding.ai Launches New Marketplace for AI Labeled Data
    In a new announcement, Bounding.ai launched its marketplace for computer vision and AI teams to access training data easily. The platform is designed to empower individuals and small companies around the world to create and sell datasets that will be instantly accessible by any team in need of labeled data. Bounding.ai Launches New Marketplace for AI Labeled Data & $5,000 Prize submitted by /u/Freyr_AI [link] [comments]  ( 1 min )
    [D]Unsupervised classification of words/phrases?
    I have found most unsupervised text classification methods to be mostly suitable for classifying documents containing relatively large amounts of words/sentences. However, I have a dataset with entries containing only single words or phrases but not full sentences. The goal is to do unsupervised semantic classification on these words/phrases. Are there any existing algorithms for such a task? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 1 min )
    [N] Robot Arm Acts As "Hand And Eyes" of Language Model To Execute Real World Tasks With SayCan And Robotics At Google
    Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. Github: https://say-can.github.io/ Video of Robot Executing Commands: https://youtu.be/zOph99BjRqs?t=4 submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] AskScience AMA Series: We are seven leading scientists specializing in the intersection of machine learning and neuroscience. Ask Us Anything about computational neuroscience or science education!
    submitted by /u/blueneuronDOTnet [link] [comments]  ( 2 min )
    [D] How DALL-E 2 Actually Works
    Here's a video explaining the overall architecture of DALL-E 2 and how it actually works! Great overview for those who haven't had time to read the paper How does DALL-E 2 actually work? submitted by /u/SleekEagle [link] [comments]  ( 1 min )
    [N] Announcing the Learning on Graphs Conference!
    We think this new venue will be valuable for the Graph/Geometric Machine Learning community. Why? See our blogpost: https://michael-bronstein.medium.com/announcing-the-learning-on-graphs-conference-c63caed7347 The LoG Conference key facts: - Covers work broadly related to machine learning on graphs and geometry - Proceedings track published in PMLR - Also has a non-archival extended abstract track - Double blind review process on OpenReview - Top reviewers receive monetary rewards - First year: virtual December 9-12 2022, free to attend. Call for papers: https://logconference.github.io/cfp/ Stay updated via Twitter: https://twitter.com/LogConference Or LinkedIn: https://www.linkedin.com/company/log-conference Advisory board: Regina Barzilay (MIT), Xavier Bresson (NUS), Michael Bronstein (Oxford/Twitter), Stephan Günnemann (TUM), Stefanie Jegelka (MIT), Jure Leskovec (Stanford), Pietro Liò (Cambridge), Jian Tang (MILA/HEC Montreal), Jie Tang (Tsinghua), Petar Veličković (DeepMind), Soledad Villar (JHU), Marinka Zitnik (Harvard). Organizers: Yuanqi Du (DP Technology), Hannes Stärk (MIT), Derek Lim (MIT), Chaitanya Joshi (Cambridge), Andreea-Ioana Deac (Mila), Iulia Duta (Cambridge), Joshua Robinson (MIT). submitted by /u/Hannes-Stark [link] [comments]  ( 1 min )
    [P] Using Language Models to (probably) Read Faster
    I explored using language models to highlight more salient parts of a PDF file which hopefully help users to read faster. The main idea is to highlight only the characters which language model failed to predict. I have implemented this as an experimental feature in sioyek PDF reader. Here is a blog post explaining this in full detail: https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html submitted by /u/highergraphic [link] [comments]  ( 2 min )
    [P] GANs and generating visually indeterminate images by error
    (Please correct me if I'm using the wrong flair/on the wrong sub) I'm currently working on a project that focuses on GANs and generative art, particularly images that concern visual indeterminacy. I trying to find papers/articles that discuss the development/application of (any kind of) GAN in which along the way or as a final result, images were generated that would be considered visually indeterminate. Specifically, research in which the objective was to generate images with clear, recognizable objects/scenes. In my mind I'm looking for articles in which the GAN architecture is discussed and in which ways what parts of it could've influenced the particular aspects of the incorrectly generated image. This probably wouldn't be the focus of any research but I was wondering if anyone has ever come across a discussion section in a GAN paper or could point me towards some areas or projects where I might find something that I could connect to my project. submitted by /u/mel4ncholi4 [link] [comments]  ( 1 min )
    [D] Ensemble methods (e.g. hard voting) in machine learning
    When should we consider ensemble methods in machine learning? Is there any statistical criteria using which we can decide, if doing ensemble may help? submitted by /u/flaubart9 [link] [comments]  ( 2 min )
    [D] How do you understand "Both FF𝐿 and FF𝑆 were 3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input."? (re-implementing https://arxiv.org/abs/2101.10587v1)
    Hi, I am re-implementing the paper "Low Resource Recognition and Linking of Biomedical Concepts from a Large Ontology" https://arxiv.org/abs/2101.10587v1 . They describe one part of their model as "3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input." For me that's not enough information to uniquely characterize the network, but maybe for someone with more experience the intended structure is obvious. The output should be a scalar, so i assume that it's something like: Dropout(0.1) -> Linear(, 1024) -> Linear(1024, 256) -> GELU -> Linear(256, 1) ? Or is the nonlinearity (GELU) normally applied after each step? submitted by /u/ldorigo [link] [comments]
    Why did SciNet not get more attention? [D]
    It seems to shatter previous benchmarks with a new, innovative architecture, yet it only has 3 citations and little to no attention from the community as far as I can see. Is it because time series forecasting is not very trendy right now or is there anything wrong with the paper? The paper in question: https://arxiv.org/pdf/2106.09305v2.pdf submitted by /u/vidul7498 [link] [comments]  ( 2 min )
    [D] Do you train and deploy models using just one framework or multiple frameworks at work?
    Hi, I'm the creator of Pinferencia. Currently I'm design new features to-do list. I want to know: Do you train and deploy models using just one framework or multiple frameworks at work? For example, use pytorch for training and deployment, or use tensorflow, pytorch for training, onnx for deployment. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [P] Extremely short and simple implementation of Denoising Diffusion Model, for educational purpose
    ​ Randomly sampled MNIST output. It's not good I know. Hi, I noticed there aren't that many simple implementation of DDPM, for example, using MNIST. I had to make a presentation for my workplace seminar, so I had to implement the simplified version of DDPM myself. The whole thing is under 200 lines of code https://github.com/cloneofsimo/minDiffusion This implementation has MANY missing details, such as Unet Models etc. I think it is worth taking a look, especially if you are interested in recent boom of diffusion models (such as Dalle 2) submitted by /u/cloneofsimo [link] [comments]  ( 1 min )
    [D] Kubernetes for ML - how are y'all doing it?
    Have been involved with Mesos since 2013, and Kubernetes almost since it's inception (and saw it win the "scheduler wars"). And now being used for pretty much _all_ container workloads, including ML training and inference. Since it was built in the image of Borg (where search indexers and map reduce jobs were preemptible, and serving search workloads had to be protected at all cost)[1], how is Kubernetes holding up for your current workflows? Are you using Kubeflow? metaflow? bespoke setup on top? [1] https://queue.acm.org/detail.cfm?id=2898444 submitted by /u/nqnielsen [link] [comments]  ( 2 min )
  • Open

    New to machine learning, want to simulate robotics in a 3d environment
    My employer makes significant use of robotic weld cells, and while working with the equipment I've noticed what seems to be room for improvement in the programming. This is purely a personal academic project, as I am quite curious on if machine learning could produce comparable or superior results to the human-made programming used at work. However, as such there will unfortunately be areas of vagueness because I need to stick to knowledge that is publicly available regarding their operations. I'm going to have to stick to more generic, publicly available reference material, and will not be able to share most, if any, of the end result. I would like to run simulations in a 3D environment, using machine learning to train a computer program to find the most efficient sequence of movements &…  ( 3 min )
    Does using a centralized critic always mean that the agents receive global observation?
    submitted by /u/No_Possibility_7588 [link] [comments]
    What algorithm would be suited for a “Just do it as good as you can” situation?
    I’m really new to RL so please bear with me if I’m making mistakes here, but I’m trying to make an environment that emulates a network of roads. The algorithm will need to generate a quick route between n destinations when n equals some number with an insane amount of permutations, like 30 for example. This is like emulating the destinations required by a mailman’s route on a map, and trying to find the fastest way to get to each one. The algorithms sequence of decisions will be choosing a node to travel to, while each node represents an street intersection or point where the street ends. By the time it’s traveled to every destination using the nodes, it’ll review the network of nodes it used and sum the distance between each one to get total distance of route. The goal is to get the total distance as small as possible. Is this realistic for a RL problem, or do I need to try to engineer some way to determine if every decision was either good or bad? Could I build a mathematical way to approximate the quickest route and then reward the RL algorithm by generating a better route than the mathematically approximated one? I could try rewarding the algorithm at each decision by whether it reduced the total distance required to any target it has yet to visit. I could try to mathematically make this more viable… what do y’all think,should I do something like that? Am I headed in the right direction? Thanks for any and all help! submitted by /u/professorDissociate [link] [comments]  ( 3 min )
    Where is env.nS for Frozen Lake in OpenAI Gym
    I am trying to run this: env4 = FrozenLakeEnv(map_name='4x4', is_slippery=False) env4.nS ​ I then get this error: 'FrozenLakeEnv' object has no attribute 'nS' ​ But I see it in the source code on line 151 and 152: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py ​ Edit: I'm trying to follow along with some tutorials online. Thank you for the help! submitted by /u/postdoc403b [link] [comments]  ( 1 min )
    Getting max/min action in DDPG and TD3
    I am using DDPG for a custom environment. My reward is positive (the sum-rate in a communication system). My problem is that I get the max or min action after a few training steps and it saturates with a non-optimized solution. How can I address this problem? I tried redesigning my reward to include positive and negative values but it didn’t work. I read that some people are using reward scaling. What is it and how would I scale it? I mean is there a specific method? I couldn’t find enough resources on that. Any help is much appreciated! submitted by /u/alicefaisal [link] [comments]  ( 1 min )
    Question about pseudocodes
    Hi I'm redoing all the RL algorithms in python, to better understanding them. I'm mostly following Sutton and Barto but the pseudo code there is often hard to follow. Do you know any other place where I can look at? submitted by /u/New_neanderthal [link] [comments]  ( 1 min )
    Industry use of reinforcement learning
    I have been studying RL now for 18 months as a goal to get a job in it. Yet when I look at jobs, I see very seldom postings about it. I am wondering why is it the case ? From my current understanding I could think of dozens of applications with huge potential gains. It feel like an untapped potential. Or am I missing something ? What do you think is the big obstacle to wider adoption to RL ? Do you think it overlaps with classical control at the moment and is not justified ? submitted by /u/Ouassimf [link] [comments]  ( 4 min )
    Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 2 min )
  • Open

    Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker
    Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech […]  ( 11 min )
    Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect
    Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for […]  ( 8 min )
  • Open

    AWS Cloud Migration: All You Need to Know
    Businesses today face myriad challenges, some of which are successfully addressed with help from cloud computing. This is where AWS cloud migration which promises to be a boon for businesses grappling with a sudden increase in traffic or for those who are looking for accelerated app deployment. It is also handy for cautious businesses that… Read More »AWS Cloud Migration: All You Need to Know The post AWS Cloud Migration: All You Need to Know appeared first on Data Science Central.  ( 3 min )
  • Open

    Web Crawling in Python
    In the old days, it was a tedious job to collect data, and sometimes very expensive. Machine learning projects cannot […] The post Web Crawling in Python appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    Kamikaze Drones in Russia’s War Against Ukraine Point to Future "Killer Robots"
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    AI News | Breakthrough AI Robot Arm Understanding From Google | OpenAI DALL-E 2 | AI Edge Computing In Space
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    My first attempt at machine learning. I made a cool chatbot 😎
    I made a self learning conversational chatbot in ReactJS. It does nothing but reply to user messages and only understands text, for now 😄 https://xalen.netlify.app What do you think? Yea or Nay? submitted by /u/GameTide [link] [comments]  ( 1 min )
    Music video about AI
    submitted by /u/starlightinspace [link] [comments]
    Computational reasoning about incomputability, infinity, truth etc (Gödel, Tarski,...)
    So I would be curious about the theoretical foundations how to make sense of higher-level abstract reasoning like reasoning about infinities, incomputability, truth (which we know cannot be defined due to Tarski) in the field of artificial intelligence. It seems due to Gödel-like constructions you are forced into inconsistent systems of reasoning when operating within a computable system. But those prove everything and "nothing", so as far as I understand it, it kind of upends the whole system of reason that the notion of artificial intelligence (and correct functioning of it) is based in. Personally due to this I don't see that the notion in the title it is a particularly coherent notion, which means there is somewhat strong limits on what (computable) AI will be able to do. But I would be curious how people that think otherwise (which seem most in the AI community?) approach this. Would you say somehow inconsistency can be avoided, or that despite inconsistency you can get reliably correct results? submitted by /u/bejaq [link] [comments]  ( 6 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Hills Have Eyes || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
    how do I fix this (I'm trying to predict sine (the blue dots are it's guesses and the white line is the "correct" answer)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
    Neuroevolution of Augmenting Topologies Course
    Hey all, There's a new course on the Neuroevolution of Augmenting Topologies (NEAT) algorithm. It's a niche algorithm, but uses some very interesting mechanisms to train/evolve simple irregular neural networks. Thought some of you may be interested. submitted by /u/Cogitarius [link] [comments]
  • Open

    Startup Transforms Meeting Notes With Time-Saving Features
    Gil Makleff and Artem Koren are developing AI for meeting transcripts, creating time-savers like shareable highlights of the text that is often TL;DR (too long; didn’t read). The Sembly founders conceived the idea after years of working in enterprise operational consulting at UMT Consulting Group, which was acquired by Ernst & Young. “We had an Read article > The post Startup Transforms Meeting Notes With Time-Saving Features appeared first on NVIDIA Blog.  ( 3 min )
    A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision
    Talk about a bright idea. A team of scientists has used GPU-accelerated deep learning to show how color can be brought to night-vision systems.  In a paper published this week in the journal PLOS One, a team of researchers at the University of California, Irvine led by Professor Pierre Baldi and Dr. Andrew Browne, describes how Read article > The post A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Data Scientists vs. BI Developer: What’s the Difference?
    Here’s the truth.  ( 1 min )
  • Open

    Learning to think critically about machine learning
    A multidisciplinary team of graduate students helps infuse ethical computing content into MIT’s largest machine learning course.  ( 7 min )

  • Open

    [D] LOOCV
    I'm working with a small dataset (~400 labeled data). I plan to use logistic regression. Does it make sense/is it necessary to have a hold-out validation set along with doing Leave-one out cross validation (LOOCV) (E.g. leave 20% out, and train model on LOOCV)? submitted by /u/yontbont1 [link] [comments]  ( 1 min )
    [D] What's the probability distribution of the Feature importances in an ensemble method?
    Assuming feature importance as defined by mean decrease in impurities. I'm curious if there are any studies about their distribution. I'm thinking about using a statistical test to check if a feature is relevant or not, all I can find is using the standard deviation as a measurement of noise. Additionally I imagine if we can give the probability of one feature being more relevant than another given their feature importances submitted by /u/FellowOfHorses [link] [comments]  ( 1 min )
    [Discussion] Collecting Feedback for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    [D] What fun things in ML would you give a presentation on?
    If you had 30 minutes to present something fun and exciting to a semi-technical audience, what would you talk about on Machine Learning that would gain interest and engagement? submitted by /u/aero_gsr [link] [comments]  ( 1 min )
    [D] Evaluation and iteration for production models - how?
    How do you evaluate and improve your models in production (particularly for complex modalities like text/vision/audio)? Good models are hard In my experience from managing our CNN-based text classification & NER model at a small media analytics startup, evaluating and improving models is a mess. Our domain is fairly niche and diverse, so getting enough training data has been challenging and I mix in custom synthetics & augmentations (which can cause weird model artifacts if you're not careful). It takes a lot of time to discover tricky failure cases by either 1) observing production traffic or 2) probing manually, and then it takes even more time to get the right data to improve model behavior. Are good models hard? What's your approach to model evaluation & targeted improvement? Are there any known best practices? I'm a bit at a loss here. As mentioned, I'm specifically interested in others who have deep models as an important part of their product or pipeline across any task or modality. More particularly: How wrong is your model? How do you test it? How would you know about errors before and after it's deployed? How much of your time do you spend on iterating on your models? For what kind of issue? Which aspects are most useful to you for improving model performance and reducing critical errors? Maybe I'll take some of the more general ideas from my work and build them out into an evaluation & iteration framework. It's currently a hybrid web of synthetic, interactive/probing and classical approaches. Or maybe there is some approach/library that makes iteration easier without me having to do anything :) submitted by /u/flotothemoon [link] [comments]  ( 2 min )
    [D] Trace norm in KFAC paper for regularization
    Hi, I doubt that the trace norm of the Kronecker product is mistaken in the KFAC paper (https://arxiv.org/abs/1503.05671). Shouldn't the division in the blue mark be replaced by multiplication? https://preview.redd.it/quoxpzubfgt81.png?width=1241&format=png&auto=webp&s=19e7b60628302f3cb37ba42944088d89d7a7bd28 submitted by /u/Cautious_Proposal132 [link] [comments]  ( 1 min )
    [D] To what extent can Rust be used for Machine Learning?
    I recently saw that some parts of HuggingFace ecosystem use Rust under the hood, and HF is a large ecosystem. I've also heard from some of my friends that they had to learn Rust as a first thing in an ML company (it's their first job so they couldn't explain to me exactly why). My questions are: What are pros and cons over Python? Are there any good frameworks in Rust for ML? Are there a decent community & documentation for Rust? Is learning it a fun experience? Is it used only for deployment? The reason I'm asking this is that I really love to learn by doing. And so, if I engaged in learning a bit of Rust for ML purposes, would I be able to create something ML-like right of the bat? It can be something as simple as MNIST classifier Take note that I don't know anything about Rust, so these questions might seem noob-like. But I believe that the answers can be of help to others as well. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 5 min )
    [D] Tips for using Ensemble Learning with a small dataset
    I have started to look into using an ensemble of relatively shallow MLPs to predict using a small dataset (~100 training samples). I was looking specifically at bagging (bootstrap aggregation) as a possibility of improving prediction accuracy. I was curious if there were any heuristics for how many models to include in a bagging ensemble? Also, more generally, am I on the correct path, or is there a better direction given my situation? A different ensemble technique, or a different path all together? Any advice would be appreciated. submitted by /u/Fritos121 [link] [comments]  ( 1 min )
    [D] What JAX NN library to use?
    I've been exploring the jax ecosystem and its many neural network libraries but I can't seem to settle on one. The main 5 which i am considering are Trax, Objax, Equinox, Flax, and Elegy, however I would like to hear which jax NN lib you use and why. submitted by /u/Southern-Trip-1102 [link] [comments]
  • Open

    Locked-image Tuning: Adding Language Understanding to Image Models
    Posted by Andreas Steiner and Basil Mustafa, Research Software Engineers at Google Research, Brain team The ability to classify images into categories has been transformed by deep learning. It has also been significantly accelerated by transfer learning, whereby models are first pre-trained on large datasets, like ImageNet, to learn visual representations that are then transferred via fine-tuning to a new task with less data (e.g., classifying animals). Previous works such as BiT and ViT employed these methods to achieve state-of-the-art performance on a wide range of classification tasks, such as the VTAB benchmark. However, fine-tuning has some downsides: though pre-training is done only once, fine-tuning is necessary on every new dataset for which task-specific data is needed. Multimo…  ( 7 min )
  • Open

    questions to ask an AI?
    i recently played a game called tacoma that had a focus on AI and in the game there was a guide for AI that showed 4 hypotheticals to ask an AI to check it's morality and it got me thinking how useful that would be for a real self-aware intelligence so i want to make a list of questions/hypotheticals to ask AGIs if you had to interview a recently created sentient AI what questions or hypotheticals would you give it to gauge it's morality, intelligence, creativity, emotion etc.? submitted by /u/neonvolta [link] [comments]  ( 1 min )
    YouTuber Meets His Creepy Robot Double and Freaks Out
    submitted by /u/estasfuera [link] [comments]
    Reference request for applications of time to ai
    Does anyone know of any AI papers, books articles etc that discuss using a sense of time to develop AI, (especially real world time)? I've come across papers that discuss how having a sense of time seems to play a role in animal cognition (e.g. temporal cognition), and I'm curious to what extent this has influenced the development of AI. Thanks in advance submitted by /u/patterntheoryacc [link] [comments]  ( 1 min )
    IBM Data Science and AI Programs on Coursera Free for 30 Days
    submitted by /u/awsconsultant [link] [comments]
    Are you aware of these AI Ethical Challenges?
    submitted by /u/JencyJane [link] [comments]
    Synthetic²: Can AI Be A Powerful Force For Creation? | SiGMA/AGS UAE 2022
    submitted by /u/thedyezwfl [link] [comments]
    Google finance chief: "We automate everything that can be automated"
    submitted by /u/much_successes [link] [comments]
    Free Webinar series | Automated CV Pipelines | Instance Classification
    Automated CV Pipelines 3rd part is open for registration. It will be covering the methods of streamlining instance classification. If you are interested to check out, here is the link to register. submitted by /u/WeekendClassic [link] [comments]
    "My A.I. writes music better than humans. World-class education in A.I. + music -> decades of work -> censored from Facebook, Twitter, soon to be downvoted or unfairly-banned from Reddit. It's making the most beautiful music I've ever heard, and society despises it."
    Thirty years it's taken me, A.I. that is not just as good as humans but better than humans at composing music: https://i.imgur.com/hReXJq1.png It passes the Turing Test, and it is also a revolution in the field of music in and of itself. In the meantime, no one has said anything nice to me in thirty years; just insults. I would feel dumb rewarding humanity with my creation; it would send the wrong message; it would affirm their bad behavior. Garbage species. Low IQ. submitted by /u/PussyFiller2022 [link] [comments]  ( 1 min )
  • Open

    PPO with one worker always picking the best action?
    If I use PPO with distributed workers, and one of the workers always picks the best action, would that skew the PPO algorithm? It might perform a tad slower, but would it factually introduce wrong math? Perhaps because the PPO optimization requires that all actions are taking proportional to their probabilities? Or would it (mathematically) not matter? submitted by /u/tmuxed [link] [comments]  ( 1 min )
    Feedback Collection for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    Is a steady linear increase in average reward during training too good to be true? Are there any common pitfalls?
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Determine Gridworld values with no probability
    I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and right are not provided or stated. In this scenario, I assume we assume the optimal policy and therefore, you would apply the Bellman equation as: V(s)=maxa(R(s,a)+γV(s′)) Cost for any movement is 0 and an agent can choose to terminate at a numbered grid to collect a reward amount of the grid number. This is why my square closest to the reward takes in the value 8 since it will terminate with the action to the next state to collect the reward. Would this be the correct way to determine the value for the surrounding grid squares? https://preview.redd.it/s9l0ok4kbgt81.png?width=806&format=png&auto=webp&s=dfb50450001541b0569d0361fd04a73daa29f222 submitted by /u/Artezian [link] [comments]  ( 2 min )
  • Open

    Three from MIT awarded 2022 Paul and Daisy Soros Fellowships for New Americans
    Fellowship funds graduate studies for outstanding immigrants and children of immigrants.  ( 6 min )
  • Open

    Latest Research From Stanford Introduces ‘Domino’: A Python Tool for Identifying and Describing Underperforming Slices in Machine Learning Models
    Machine learning and Artificial Intelligence models have gained promising results in recent years. The major factor behind their success is the availability and development of vast datasets. However, regardless of how many terabytes of data you have or how skilled you are at data science, machine learning models will be useless and even dangerous if you can’t make sense of data records. A slice is a collection of data samples with a common feature. For example, in a picture dataset, photographs of antique vehicles make up a slice. When a model’s performance on the data samples in a slice is significantly lower than its overall performance, the slice is considered underperforming. Deploying models underperforming on crucial data slices could seriously harm safety and fairness. For instance, models trained to detect collapsed lungs in chest X-rays generally make predictions based on the presence of chest drains, a common therapeutic device. As a result, computer models typically fail to detect collapsed lungs in images without chest drains, a critical data slice in which inaccurate negative predictions could be catastrophic. Not many studies have considered underperforming slices during model evaluation. Researchers believe that knowing which slices their models underperform would help practitioners not just make better decisions regarding model deployment but also improve model robustness by upgrading the training dataset or utilizing robust optimization strategies. Detecting slices is challenging because the “hidden” data slices are linked by a notion that isn’t easily derived from unstructured inputs or labeled in metadata (e.g., images, video, time-series data). Continue reading the summary Paper: https://arxiv.org/pdf/2203.14960.pdf Article: http://ai.stanford.edu/blog/domino/ Github: https://github.com/HazyResearch/domino submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    NN from Scratch: #3 Forward propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    AI-generated easter eggs
    How would AI decorate an easter egg? I've tried this before by training an image-generating model exclusively on pictures of easter eggs I decorated (they came out plain, if a bit wobbly). I decided to see what I would get using a model based on CLIP, which has  ( 3 min )
    Bonus: What does the x-ray of an Easter egg look like?
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    R-Learning AI self-taking over processes
    An inside look at how REINFORCEMENT learning, without past reference, extracts “optimal” decisions through simple interaction …  ( 10 min )
  • Open

    GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW
    This GFN Thursday delivers more gr-EA-t games as two new titles from Electronic Arts join the GeForce NOW library. Gamers can now enjoy Need for Speed HEAT  and Plants vs. Zombies Garden Warfare 2 streaming from GeForce NOW to underpowered PCs, Macs, Chromebooks, SHIELD TV and mobile devices. It’s all part of the eight  total Read article > The post GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    How AI is Changing Digital Marketing
    What is Artificial Intelligence? Oxford Languages defines AI as the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. For those of us working in the realm of digital marketing, the impact has become even more clear over… Read More »How AI is Changing Digital Marketing The post How AI is Changing Digital Marketing appeared first on Data Science Central.  ( 4 min )
    AI For Compliance: What, Why, How
    With the constant rise and use of technology, Artificial Intelligence (AI) has become a great companion to compliance. Compliance is one of the biggest playing fields and plays a pivotal role in banking institutions. It aims to identify, diminish, and manage risks such as insider trading, spoofing attacks, exploitation of the market, front-running, and more by… Read More »AI For Compliance: What, Why, How The post AI For Compliance: What, Why, How appeared first on Data Science Central.  ( 9 min )
    Benefits of Data Governance and Compliance
    While data compliance is the practice of organizations ensuring that all sensitive data is managed and organized in a way that enables them to meet their business rules alongside legal and governmental regulations, data governance involves the process of managing organizational data’s usability, security, availability, and quality using the internally set rules and policies. Data… Read More »Benefits of Data Governance and Compliance The post Benefits of Data Governance and Compliance appeared first on Data Science Central.  ( 3 min )
    How To Write A Technical Dissertation
    Technical dissertation writing sometimes seems impossible until it is done. A dissertation is among the lengthiest tasks that can take months to get completed. Thus, it exhausts students, but there is no way around it. It is worth more than about 60 credits in a thesis-based degree. Moreover, gathering proper knowledge and top guidelines about… Read More »How To Write A Technical Dissertation The post How To Write A Technical Dissertation appeared first on Data Science Central.  ( 5 min )
    Why Data Engineers are in Greater Demand than Data Scientists
    Globally, many think that data scientist is the best job after Harvard declared it to be one of the hottest jobs of the decade.  And since then, many have been choosing it as their career path. But the role of a data engineer is as important as the data scientist is, because if a data… Read More »Why Data Engineers are in Greater Demand than Data Scientists The post Why Data Engineers are in Greater Demand than Data Scientists appeared first on Data Science Central.  ( 3 min )
  • Open

    When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation. (arXiv:2203.16311v2 [cs.LG] UPDATED)
    Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present a systematic study of post-exploration, answering open questions that the Go-Explore paper did not answer yet. First, we study the isolated potential of post-exploration, by turning it on and off within the same algorithm. Subsequently, we introduce new methodology to adaptively decide when to post-explore and for how long to post-explore. Experiments on a range of MiniGrid environments show that post-exploration indeed boosts performance (with a bigger impact than tuning regular exploration parameters), and this effect is further enhanced by adaptively deciding when and for how long to post-explore. In short, our work identifies adaptive post-exploration as a promising direction for RL exploration research.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
    Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS. (arXiv:2109.01818v2 [cs.LG] UPDATED)
    In recent years, convolutional neural networks (CNNs) have experienced an increasing interest in their ability to perform a fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability prediction consists of permeability labels that are typically generated by classical lattice Boltzmann methods (LBM) that simulate the flow through the pore space of the segmented image data. We instead perform direct numerical simulation (DNS) by solving the stationary Stokes equation in an efficient and distributed-parallel manner. As such, we circumvent the convergence issues of LBM that frequently are observed on complex pore geometries, and therefore, improve the generality and accuracy of our training data set. Using the DNS-computed permeabilities, a physics-informed CNN PhyCNN) is trained by additionally providing a tailored characteristic quantity of the pore space. More precisely, by exploiting the connection to flow problems on a graph representation of the pore space, additional information about confined structures is provided to the network in terms of the maximum flow value, which is the key innovative component of our workflow. The robustness of this approach is reflected by very high prediction accuracy, which is observed for a variety of sandstone samples from archetypal rock formations.
    Highly efficient reliability analysis of anisotropic heterogeneous slopes: Machine Learning aided Monte Carlo method. (arXiv:2204.06098v1 [cs.LG])
    Machine Learning (ML) algorithms are increasingly used as surrogate models to increase the efficiency of stochastic reliability analyses in geotechnical engineering. This paper presents a highly efficient ML aided reliability technique that is able to accurately predict the results of a Monte Carlo (MC) reliability study, and yet performs 500 times faster. A complete MC reliability analysis on anisotropic heterogeneous slopes consisting of 120,000 simulated samples is conducted in parallel to the proposed ML aided stochastic technique. Comparing the results of the complete MC study and the proposed ML aided technique, the expected errors of the proposed method are realistically examined. Circumventing the time-consuming computation of factors of safety for the training datasets, the proposed technique is more efficient than previous methods. Different ML models, including Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN) are presented, optimised and compared. The effects of the size and type of training and testing datasets are discussed. The expected errors of the ML predicted probability of failure are characterised by different levels of soil heterogeneity and anisotropy. Using only 1% of MC samples to train ML surrogate models, the proposed technique can accurately predict the probability of failure with mean errors limited to 0.7%. The proposed technique reduces the computational time required for our study from 306 days to only 14 hours, providing 500 times higher efficiency.
    OntoProtein: Protein Pretraining With Gene Ontology Embedding. (arXiv:2201.11147v3 [q-bio.BM] UPDATED)
    Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.
    A streamable large-scale clinical EEG dataset for Deep Learning. (arXiv:2203.02552v2 [cs.LG] UPDATED)
    Deep Learning has revolutionized various fields, including Computer Vision, Natural Language Processing, as well as Biomedical research. Within the field of neuroscience, specifically in electrophysiological neuroimaging, researchers are starting to explore leveraging deep learning to make predictions on their data without extensive feature engineering. The availability of large-scale datasets is a crucial aspect of allowing the experimentation of Deep Learning models. We are publishing the first large-scale clinical EEG dataset that simplifies data access and management for Deep Learning. This dataset contains eyes-closed EEG data prepared from a collection of 1,574 juvenile participants from the Healthy Brain Network. We demonstrate a use case integrating this framework, and discuss why providing such neuroinformatics infrastructure to the community is critical for future scientific discoveries.
    Research on Intellectual Property Resource Profile and Evolution Law. (arXiv:2204.06221v1 [cs.DL])
    In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing. This makes intellectual property-oriented science and technology resource portraits and analysis of evolution become the current research hotspot. This paper sorts out the construction method of intellectual property resource intellectual portrait and its pre-work property entity extraction and entity completion from the aspects of algorithm classification and general process, and directions for improvement of future methods.
    Adjacency constraint for efficient hierarchical reinforcement learning. (arXiv:2111.00213v3 [cs.LG] UPDATED)
    Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.
    Learning from All Vehicles. (arXiv:2203.11934v2 [cs.RO] UPDATED)
    In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. This system uses the behaviors of other agents to create more diverse driving scenarios without collecting additional data. The main difficulty in learning from other vehicles is that there is no sensor information. We use a set of supervisory tasks to learn an intermediate representation that is invariant to the viewpoint of the controlling vehicle. This not only provides a richer signal at training time but also allows more complex reasoning during inference. Learning how all vehicles drive helps predict their behavior at test time and can avoid collisions. We evaluate this system in closed-loop driving simulations. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points. Our method won the 2021 CARLA Autonomous Driving challenge. Code and data are available at https://github.com/dotchen/LAV.
    Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms. (arXiv:2204.06106v1 [cs.CR])
    Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set. A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples. While this use of DP is a principled approach to limit the efficacy of MI attacks, there is a gap between the bounds provided by DP and the empirical performance of MI attacks. In this paper, we derive bounds for the \textit{advantage} of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism. We further show bounds on the \textit{confidence} of MI attacks. Our bounds are much stronger than those obtained by DP analysis. For example, analyzing a setting of DP-SGD with $\epsilon=4$ would obtain an upper bound on the advantage of $\approx0.36$ based on our analyses, while getting bound of $\approx 0.97$ using the analysis of previous work that convert $\epsilon$ to membership inference bounds. Finally, using our analysis, we provide MI metrics for models trained on CIFAR10 dataset. To the best of our knowledge, our analysis provides the state-of-the-art membership inference bounds for the privacy.
    LDPC codes: comparing cluster graphs to factor graphs. (arXiv:2204.06350v1 [cs.IT])
    We present a comparison study between a cluster and factor graph representation of LDPC codes. In probabilistic graphical models, cluster graphs retain useful dependence between random variables during inference, which are advantageous in terms of computational cost, convergence speed, and accuracy of marginal probabilities. This study investigates these benefits in the context of LDPC codes and shows that a cluster graph representation outperforms the traditional factor graph representation.
    Deep Learning-based Framework for Automatic Cranial Defect Reconstruction and Implant Modeling. (arXiv:2204.06310v1 [eess.IV])
    The goal of this work is to propose a robust, fast, and fully automatic method for personalized cranial defect reconstruction and implant modeling. We propose a two-step deep learning-based method using a modified U-Net architecture to perform the defect reconstruction, and a dedicated iterative procedure to improve the implant geometry, followed by automatic generation of models ready for 3-D printing. We propose a cross-case augmentation based on imperfect image registration combining cases from different datasets. We perform ablation studies regarding different augmentation strategies and compare them to other state-of-the-art methods. We evaluate the method on three datasets introduced during the AutoImplant 2021 challenge, organized jointly with the MICCAI conference. We perform the quantitative evaluation using the Dice and boundary Dice coefficients, and the Hausdorff distance. The average Dice coefficient, boundary Dice coefficient, and the 95th percentile of Hausdorff distance are 0.91, 0.94, and 1.53 mm respectively. We perform an additional qualitative evaluation by 3-D printing and visualization in mixed reality to confirm the implant's usefulness. We propose a complete pipeline that enables one to create the cranial implant model ready for 3-D printing. The described method is a greatly extended version of the method that scored 1st place in all AutoImplant 2021 challenge tasks. We freely release the source code, that together with the open datasets, makes the results fully reproducible. The automatic reconstruction of cranial defects may enable manufacturing personalized implants in a significantly shorter time, possibly allowing one to perform the 3-D printing process directly during a given intervention. Moreover, we show the usability of the defect reconstruction in mixed reality that may further reduce the surgery time.
    Distributionally Robust Models with Parametric Likelihood Ratios. (arXiv:2204.06340v1 [cs.LG])
    As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift. The Distributionally Robust Optimization (DRO) framework proposes to address this issue by training models to minimize their expected risk under a collection of distributions, to imitate test-time shifts. This is most commonly achieved by instance-level re-weighting of the training objective to emulate the likelihood ratio with possible test distributions, which allows for estimating their empirical risk via importance sampling (assuming that they are subpopulations of the training distribution). However, re-weighting schemes in the literature are usually limited due to the difficulty of keeping the optimization problem tractable and the complexity of enforcing normalization constraints. In this paper, we show that three simple ideas -- mini-batch level normalization, a KL penalty and simultaneous gradient updates -- allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and text classification benchmarks, we find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches, and that the method performs reliably well with little hyper-parameter tuning. Code to reproduce our experiments can be found at https://github.com/pmichel31415/P-DRO.
    Learning multiobjective rough terrain traversability. (arXiv:2203.16354v2 [cs.RO] UPDATED)
    We present a method that uses high-resolution topography data of rough terrain, and ground vehicle simulation, to predict traversability. Traversability is expressed as three independent measures: the ability to traverse the terrain at a target speed, energy consumption, and acceleration. The measures are continuous and reflect different objectives for planning that go beyond binary classification. A deep neural network is trained to predict the traversability measures from the local heightmap and target speed. To produce training data, we use an articulated vehicle with wheeled bogie suspensions and procedurally generated terrains. We evaluate the model on laser-scanned forest terrains, previously unseen by the model. The model predicts traversability with an accuracy of 90%. Predictions rely on features from the high-dimensional terrain data that surpass local roughness and slope relative to the heading. Correlations show that the three traversability measures are complementary to each other. With an inference speed 3000 times faster than the ground truth simulation and trivially parallelizable, the model is well suited for traversability analysis and optimal path planning over large areas.
    DL4SciVis: A State-of-the-Art Survey on Deep Learning for Scientific Visualization. (arXiv:2204.06504v1 [cs.GR])
    Since 2016, we have witnessed the tremendous growth of artificial intelligence+visualization (AI+VIS) research. However, existing survey papers on AI+VIS focus on visual analytics and information visualization, not scientific visualization (SciVis). In this paper, we survey related deep learning (DL) works in SciVis, specifically in the direction of DL4SciVis: designing DL solutions for solving SciVis problems. To stay focused, we primarily consider works that handle scalar and vector field data but exclude mesh data. We classify and discuss these works along six dimensions: domain setting, research task, learning type, network architecture, loss function, and evaluation metric. The paper concludes with a discussion of the remaining gaps to fill along the discussed dimensions and the grand challenges we need to tackle as a community. This state-of-the-art survey guides SciVis researchers in gaining an overview of this emerging topic and points out future directions to grow this research.
    Scalable Training of Language Models using JAX pjit and TPUv4. (arXiv:2204.06514v1 [cs.LG])
    Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency improvements coming from adopting new software and hardware solutions.
    Sentiment Analysis of Political Tweets for Israel using Machine Learning. (arXiv:2204.06515v1 [cs.IR])
    Sentiment Analysis is a vital research topic in the field of Computer Science. With the accelerated development of Information Technology and social networks, a massive amount of data related to comment texts has been generated on web applications or social media platforms like Twitter. Due to this, people have actively started proliferating general information and the information related to political opinions, which becomes an important reason for analyzing public reactions. Most researchers have used social media specifics or contents to analyze and predict public opinion concerning political events. This research proposes an analytical study using Israeli political Twitter data to interpret public opinion towards the Palestinian-Israeli conflict. The attitudes of ethnic groups and opinion leaders in the form of tweets are analyzed using Machine Learning algorithms like Support Vector Classifier (SVC), Decision Tree (DT), and Naive Bayes (NB). Finally, a comparative analysis is done based on experimental results from different models.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    COIL: Constrained Optimization in Learned Latent Space -- Learning Representations for Valid Solutions. (arXiv:2202.02163v3 [cs.NE] UPDATED)
    Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. Preliminary experiments show promise: compared to an identical GA using a standard representation that cannot meet the constraints or find fit solutions, COIL with its learned latent representation can perfectly satisfy different types of constraints while finding high-fitness solutions.
    Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection. (arXiv:2203.02194v2 [cs.CV] UPDATED)
    In some scenarios, classifier requires detecting out-of-distribution samples far from its training data. With desirable characteristics, reconstruction autoencoder-based methods deal with this problem by using input reconstruction error as a metric of novelty vs. normality. We formulate the essence of such approach as a quadruplet domain translation with an intrinsic bias to only query for a proxy of conditional data uncertainty. Accordingly, an improvement direction is formalized as maximumly compressing the autoencoder's latent space while ensuring its reconstructive power for acting as a described domain translator. From it, strategies are introduced including semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods, which together establish state-of-the-art performance on various benchmarks, e.g., the FPR@95%TPR of CIFAR-100 vs. TinyImagenet-crop on Wide-ResNet is 0.2%. Importantly, our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
    Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks. (arXiv:2203.11011v2 [cs.IR] UPDATED)
    Massive open online courses (MOOCs), which provide a large-scale interactive participation and open access via the web, are becoming a modish way for online and distance education. To help users have a better study experience, many MOOC platforms have provided the services of recommending courses to users. However, we argue that directly recommending a course to users will ignore the expertise levels of different users. To fill this gap, this paper studies the problem of concept recommendation in a more fine-grained view. We propose a novel Heterogeneous Information Networks based Concept Recommender with Reinforcement Learning (HinCRec-RL) incorporated for concept recommendation in MOOCs. Specifically, we first formulate the concept recommendation in MOOCs as a reinforcement learning problem to better model the dynamic interaction among users and knowledge concepts. In addition, to mitigate the data sparsity issue which also exists in many other recommendation tasks, we consider a heterogeneous information network (HIN) among users, courses, videos and concepts, to better learn the semantic representation of users. In particular, we use the meta-paths on HIN to guide the propagation of users' preferences and propose a heterogeneous graph attention network to represent the meta-paths. To validate the effectiveness of our proposed approach, we conduct comprehensive experiments on a real-world dataset from XuetangX, a popular MOOC platform from China. The promising results show that our proposed approach can outperform other baselines.
    A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. (arXiv:2204.06164v1 [eess.AS])
    In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.
    Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning. (arXiv:2007.13695v3 [eess.SP] UPDATED)
    Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput or spectrum efficiency that it experiences. The proposed solution is evaluated in two settings: using a series of generated environments where we vary the number of BS and building densities, and in a scenario using real-world data obtained from an experiment in Dublin, Ireland. Results show that our proposed RL-based solution improves UAVs QoS by 6% to 41%, depending on the scenario. We also conclude that, when flying at heights higher than the buildings, building density variation has no impact on UAV QoS. On the other hand, BS density can negatively impact UAV QoS, with higher numbers of BSs generating more interference and deteriorating UAV performance.
    QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks. (arXiv:2110.14181v4 [eess.IV] UPDATED)
    Automated segmentation of pathological regions of interest aids medical image diagnostics and follow-up care. However, accurate pathological segmentations require high quality of annotated data that can be both cost and time intensive to generate. In this work, we propose an automated two-step method that detects a minimal image subset required to train segmentation models by evaluating the quality of medical images from 3D image stacks using a U-net++ model. These images that represent a lack of quality training can then be annotated and used to fully train a U-net-based segmentation model. The proposed QU-net++ model detects this lack of quality training based on the disagreement in segmentations produced from the final two output layers. The proposed model isolates around 10% of the slices per 3D image stack and can scale across imaging modalities to segment cysts in OCT images and ground glass opacity (GGO) in lung CT images with Dice scores in the range 0.56-0.72. Thus, the proposed method can be applied for cost effective multi-modal pathology segmentation tasks.
    Why KDAC? A general activation function for knowledge discovery. (arXiv:2111.13858v3 [cs.LG] UPDATED)
    Deep learning oriented named entity recognition (DNER) has gradually become the paradigm of knowledge discovery, which greatly promotes domain intelligence. However, the current activation function of DNER fails to treat gradient vanishing, no negative output or non-differentiable existence, which may impede knowledge exploration caused by the omission and incomplete representation of latent semantics. To break through the dilemma, we present a novel activation function termed KDAC. Detailly, KDAC is an aggregation function with multiple conversion modes. The backbone of the activation region is the interaction between exponent and linearity, and the both ends extend through adaptive linear divergence, which surmounts the obstacle of gradient vanishing and no negative output. Crucially, the non-differentiable points are alerted and eliminated by an approximate smoothing algorithm. KDAC has a series of brilliant properties, including nonlinear, stable near-linear transformation and derivative, as well as dynamic style, etc. We perform experiments based on BERT-BiLSTM-CNN-CRF model on six benchmark datasets containing different domain knowledge, such as Weibo, Clinical, E-commerce, Resume, HAZOP and People's daily. The evaluation results show that KDAC is advanced and effective, and can provide more generalized activation to stimulate the performance of DNER. We hope that KDAC can be exploited as a promising activation function to devote itself to the construction of knowledge.
    Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability. (arXiv:2204.06425v1 [cs.SE])
    Machine learning models have been widely developed, released, and adopted in numerous applications. Meanwhile, the documentation practice for machine learning models often falls short of established practices for traditional software components, which impedes model accountability, inadvertently abets inappropriate or misuse of models, and may trigger negative social impact. Recently, model cards, a template for documenting machine learning models, have attracted notable attention, but their impact on the practice of model documentation is unclear. In this work, we examine publicly available model cards and other similar documentation. Our analysis reveals a substantial gap between the suggestions made in the original model card work and the content in actual documentation. Motivated by this observation and literature on fields such as software documentation, interaction design, and traceability, we further propose a set of design guidelines that aim to support the documentation practice for machine learning models including (1) the collocation of documentation environment with the coding environment, (2) nudging the consideration of model card sections during model development, and (3) documentation derived from and traced to the source. We designed a prototype tool named DocML following those guidelines to support model development in computational notebooks. A lab study reveals the benefit of our tool to shift the behavior of data scientists towards documentation quality and accountability.
    Modelling Evolutionary and Stationary User Preferences for Temporal Sets Prediction. (arXiv:2204.05490v2 [cs.LG] UPDATED)
    Given a sequence of sets, where each set is associated with a timestamp and contains an arbitrary number of elements, the task of temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly capture each user's evolutionary preference by learning from his/her own sequence. Although insightful, we argue that: 1) the collaborative signals latent in different users' sequences are essential but have not been exploited; 2) users also tend to show stationary preferences while existing methods fail to consider. To this end, we propose an integrated learning framework to model both the evolutionary and the stationary preferences of users for temporal sets prediction, which first constructs a universal sequence by chronologically arranging all the user-set interactions, and then learns on each user-set interaction. In particular, for each user-set interaction, we first design an evolutionary user preference modelling component to track the user's time-evolving preference and exploit the latent collaborative signals among different users. This component maintains a memory bank to store memories of the related user and elements, and continuously updates their memories based on the currently encoded messages and the past memories. Then, we devise a stationary user preference modelling module to discover each user's personalized characteristics according to the historical sequence, which adaptively aggregates the previously interacted elements from dual perspectives with the guidance of the user's and elements' embeddings. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x training speedups on average. Experiments on real-world datasets demonstrate the effectiveness and good interpretability of our approach.
    AHP: Learning to Negative Sample for Hyperedge Prediction. (arXiv:2204.06353v1 [cs.LG])
    Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implication for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.
    Label Augmentation with Reinforced Labeling for Weak Supervision. (arXiv:2204.06436v1 [cs.LG])
    Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs) instead of hand-labeling each data point. However, the existing approach fails to fully exploit the domain knowledge encoded into LFs, especially when the LFs' coverage is low. This is due to the common data programming pipeline that neglects to utilize data features during the generative process. This paper proposes a new approach called reinforced labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Thus, RL can lead to higher labeling coverage for training an end classifier. The experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains. The new approach produces significant performance improvement, leading up to +21 points in accuracy and +61 points in F1 scores compared to the state-of-the-art data programming approach.
    Clinical trial site matching with improved diversity using fair policy learning. (arXiv:2204.06501v1 [cs.LG])
    The ongoing pandemic has highlighted the importance of reliable and efficient clinical trials in healthcare. Trial sites, where the trials are conducted, are chosen mainly based on feasibility in terms of medical expertise and access to a large group of patients. More recently, the issue of diversity and inclusion in clinical trials is gaining importance. Different patient groups may experience the effects of a medical drug/ treatment differently and hence need to be included in the clinical trials. These groups could be based on ethnicity, co-morbidities, age, or economic factors. Thus, designing a method for trial site selection that accounts for both feasibility and diversity is a crucial and urgent goal. In this paper, we formulate this problem as a ranking problem with fairness constraints. Using principles of fairness in machine learning, we learn a model that maps a clinical trial description to a ranked list of potential trial sites. Unlike existing fairness frameworks, the group membership of each trial site is non-binary: each trial site may have access to patients from multiple groups. We propose fairness criteria based on demographic parity to address such a multi-group membership scenario. We test our method on 480 real-world clinical trials and show that our model results in a list of potential trial sites that provides access to a diverse set of patients while also ensuing a high number of enrolled patients.
    Distilling the Knowledge of Romanian BERTs Using Multiple Teachers. (arXiv:2112.12650v3 [cs.CL] UPDATED)
    Running large-scale pre-trained language models in computationally constrained environments remains a challenging problem yet to be addressed, while transfer learning from these models has become prevalent in Natural Language Processing tasks. Several solutions, including knowledge distillation, network quantization, or network pruning have been previously proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base, and DistilMulti-BERT-base-ro. The first two models resulted from the individual distillation of knowledge from two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity, and dialect identification. Our experimental results argue that the three distilled models offer performance comparable to their teachers, while being twice as fast on a GPU and ~35% smaller. In addition, we further test the similarity between the predictions of our students versus their teachers by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.
    Estimation of stellar atmospheric parameters from LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. (arXiv:2204.06301v1 [astro-ph.GA])
    The accuracy of the estimated stellar atmospheric parameter decreases evidently with the decreasing of spectral signal-to-noise ratio (SNR) and there are a huge amount of this kind observations, especially in case of SNR$<$30. Therefore, it is helpful to improve the parameter estimation performance for these spectra and this work studied the ($T_\texttt{eff}, \log~g$, [Fe/H]) estimation problem for LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. We proposed a data-driven method based on machine learning techniques. Firstly, this scheme detected stellar atmospheric parameter-sensitive features from spectra by the Least Absolute Shrinkage and Selection Operator (LASSO), rejected ineffective data components and irrelevant data. Secondly, a Multi-layer Perceptron (MLP) method was used to estimate stellar atmospheric parameters from the LASSO features. Finally, the performance of the LASSO-MLP was evaluated by computing and analyzing the consistency between its estimation and the reference from the APOGEE (Apache Point Observatory Galactic Evolution Experiment) high-resolution spectra. Experiments show that the Mean Absolute Errors (MAE) of $T_\texttt{eff}, \log~g$, [Fe/H] are reduced from the LASP (137.6 K, 0.195 dex, 0.091 dex) to LASSO-MLP (84.32 K, 0.137 dex, 0.063 dex), which indicate evident improvements on stellar atmospheric parameter estimation. In addition, this work estimated the stellar atmospheric parameters for 1,162,760 low-resolution spectra with 20$\leq$SNR$<$30 from LAMOST DR8 using LASSO-MLP, and released the estimation catalog, learned model, experimental code, trained model, training data and test data for scientific exploration and algorithm study.
    Autonomy and Perception for Space Mining. (arXiv:2109.12109v3 [cs.RO] UPDATED)
    Future Moon bases will likely be constructed using resources mined from the surface of the Moon. The difficulty of maintaining a human workforce on the Moon and communications lag with Earth means that mining will need to be conducted using collaborative robots with a high degree of autonomy. In this paper, we describe our solution for Phase 2 of the NASA Space Robotics Challenge, which provided a simulated lunar environment in which teams were tasked to develop software systems to achieve autonomous collaborative robots for mining on the Moon. Our 3rd place and innovation award winning solution shows how machine learning-enabled vision could alleviate major challenges posed by the lunar environment towards autonomous space mining, chiefly the lack of satellite positioning systems, hazardous terrain, and delicate robot interactions. A robust multi-robot coordinator was also developed to achieve long-term operation and effective collaboration between robots.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Disentangling Autoencoders (DAE). (arXiv:2202.09926v2 [cs.LG] UPDATED)
    Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations in group-theory. To the best of our knowledge, this is the first deterministic model that is aiming to achieve disentanglement based on autoencoders without regularizers. The proposed model is compared to seven state-of-the-art generative models based on autoencoders and evaluated based on five supervised disentanglement metrics. The experimental results show that the proposed model can have better disentanglement when variances of each features are different. We believe that this model leads to a new field for disentanglement learning based on autoencoders without regularizers.
    Reinforcement Learning on Graph: A Survey. (arXiv:2204.06127v1 [cs.LG])
    Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training. (arXiv:2204.06322v1 [eess.AS])
    We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering strategy based on user-feedback signals for federated distillation. These techniques created models that significantly improved quality metrics in offline evaluations and user-experience metrics in live A/B experiments.
    Modeling and Analysis of Intermittent Federated Learning Over Cellular-Connected UAV Networks. (arXiv:2110.07077v3 [cs.LG] UPDATED)
    Federated learning (FL) is a promising distributed learning technique particularly suitable for wireless learning scenarios since it can accomplish a learning task without raw data transportation so as to preserve data privacy and lower network resource consumption. However, current works on FL over wireless networks do not profoundly study the fundamental performance of FL over wireless networks that suffers from communication outage due to channel impairment and network interference. To accurately exploit the performance of FL over wireless networks, this paper proposes a novel intermittent FL model over a cellular-connected unmanned aerial vehicle (UAV) network, which characterizes communication outage from UAV (clients) to their server and data heterogeneity among the datasets at UAVs. We propose an analytically tractable framework to derive the uplink outage probability and use it to devise a simulation-based approach so as to evaluate the performance of the proposed intermittent FL model. Our findings reveal how the intermittent FL model is impacted by uplink communication outage and UAV deployment. Extensive numerical simulations are provided to show the consistency between the simulated and analytical performances of the proposed intermittent FL model.
    FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning. (arXiv:2204.05562v2 [cs.LG] UPDATED)
    The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.
    Baseline Computation for Attribution Methods Based on Interpolated Inputs. (arXiv:2204.06120v1 [cs.CV])
    We discuss a way to find a well behaved baseline for attribution methods that work by feeding a neural network with a sequence of interpolated inputs between two given inputs. Then, we test it with our novel Riemann-Stieltjes Integrated Gradient-weighted Class Activation Mapping (RSI-Grad-CAM) attribution method.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Organization of a Latent Space structure in VAE/GAN trained by navigation data. (arXiv:2102.01852v3 [cs.LG] UPDATED)
    We present a novel artificial cognitive mapping system using generative deep neural networks, called variational autoencoder/generative adversarial network (VAE/GAN), which can map input images to latent vectors and generate temporal sequences internally. The results show that the distance of the predicted image is reflected in the distance of the corresponding latent vector after training. This indicates that the latent space is self-organized to reflect the proximity structure of the dataset and may provide a mechanism through which many aspects of cognition are spatially represented. The present study allows the network to internally generate temporal sequences that are analogous to the hippocampal replay/pre-play ability, where VAE produces only near-accurate replays of past experiences, but by introducing GANs, the generated sequences are coupled with instability and novelty.
    FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. (arXiv:2204.06508v1 [cs.CL])
    Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications. Recent works have shown promising improvements in factuality error identification using text or dependency arc entailments; however, they do not consider the entire semantic graph simultaneously. To this end, we propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MR), which are more suitable for factuality evaluation. MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity. FactGraph encodes such graphs using a graph encoder augmented with structure-aware adapters to capture interactions among the concepts based on the graph connectivity, along with text representations using an adapter-based text encoder. Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%. Furthermore, FactGraph improves performance on identifying content verifiability errors and better captures subsentence-level factual inconsistencies.
    Challenges and Opportunities of Edge AI for Next-Generation Implantable BMIs. (arXiv:2204.02362v2 [cs.AI] UPDATED)
    Neuroscience and neurotechnology are currently being revolutionized by artificial intelligence (AI) and machine learning. AI is widely used to study and interpret neural signals (analytical applications), assist people with disabilities (prosthetic applications), and treat underlying neurological symptoms (therapeutic applications). In this brief, we will review the emerging opportunities of on-chip AI for the next-generation implantable brain-machine interfaces (BMIs), with a focus on state-of-the-art prosthetic BMIs. Major technological challenges for the effectiveness of AI models will be discussed. Finally, we will present algorithmic and IC design solutions to enable a new generation of AI-enhanced and high-channel-count BMIs.
    On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance. (arXiv:2204.06122v1 [cs.SI])
    For more than a half-century, credit risk management has used credit scoring models in each of its well-defined stages to manage credit risk. Application scoring is used to decide whether to grant a credit or not, while behavioral scoring is used mainly for portfolio management and to take preventive actions in case of default signals. In both cases, network data has recently been shown to be valuable to increase the predictive power of these models, especially when the borrower's historical data is scarce or not available. This study aims to understand the creditworthiness assessment performance dynamics and how it is influenced by the credit history, repayment behavior, and social network features. To accomplish this, we introduced a machine learning classification framework to analyze 97.000 individuals and companies from the moment they obtained their first loan to 12 months afterward. Our novel and massive dataset allow us to characterize each borrower according to their credit behavior, and social and economic relationships. Our research shows that borrowers' history increases performance at a decreasing rate during the first six months and then stabilizes. The most notable effect on perfomance of social networks features occurs at loan application; in personal scoring, this effect prevails a few months, while in business scoring adds value throughout the study period. These findings are of great value to improve credit risk management and optimize the use of traditional information and alternative data sources.
    Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation. (arXiv:2204.06439v1 [cs.SD])
    Speech dereverberation is often an important requirement in robust speech processing tasks. Supervised deep learning (DL) models give state-of-the-art performance for single-channel speech dereverberation. Temporal convolutional networks (TCNs) are commonly used for sequence modelling in speech enhancement tasks. A feature of TCNs is that they have a receptive field (RF) dependant on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame. It has been shown that TCNs are capable of performing dereverberation of simulated speech data, however a thorough analysis, especially with focus on the RF is yet lacking in the literature. This paper analyses dereverberation performance depending on the model size and the RF of TCNs. Experiments using the WHAMR corpus which is extended to include room impulse responses (RIRs) with larger T60 values demonstrate that a larger RF can have significant improvement in performance when training smaller TCN models. It is also demonstrated that TCNs benefit from a wider RF when dereverberating RIRs with larger RT60 values.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    A pipeline and comparative study of 12 machine learning models for text classification. (arXiv:2204.06518v1 [cs.IR])
    Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Keys to Accurate Feature Extraction Using Residual Spiking Neural Networks. (arXiv:2111.05955v3 [cs.LG] UPDATED)
    Spiking neural networks (SNNs) have become an interesting alternative to conventional artificial neural networks (ANN) thanks to their temporal processing capabilities and energy efficient implementations in neuromorphic hardware. However the challenges involved in training SNNs have limited their performance in terms of accuracy and thus their applications. Improving learning algorithms and neural architectures for a more accurate feature extraction is therefore one of the current priorities in SNN research. In this paper we present a study on the key components of modern spiking architectures. We empirically compare different techniques in image classification datasets taken from the best performing networks. We design a spiking version of the successful residual network architecture and provide an in-depth study on the possible implementations of spiking residual connections. Our results provide a state of the art guide to SNN design, which allows to make informed choices when trying to build the optimal visual feature extractor. Finally, our network outperforms previous SNN architectures in CIFAR-10 (94.14%) and CIFAR-100 (74.65%) datasets and matches the state of the art in DVS-CIFAR10 (72.98%), with less parameters than the previous state of the art and without the need for ANN-SNN conversion. Code available at https://github.com/VicenteAlex/Spiking_ResNet
    Inspection-L: A Self-Supervised GNN-Based Money Laundering Detection System for Bitcoin. (arXiv:2203.10465v2 [cs.CR] UPDATED)
    Criminals have become increasingly experienced in using cryptocurrencies, such as Bitcoin, for money laundering. The use of cryptocurrencies can hide criminal identities and transfer hundreds of millions of dollars of dirty funds through their criminal digital wallets. However, this is considered a paradox because cryptocurrencies are gold mines for open-source intelligence, allowing law enforcement agencies to have more power in conducting forensic analyses. This paper proposed Inspection-L, a graph neural network (GNN) framework based on self-supervised Deep Graph Infomax (DGI), with supervised learning algorithms, namely Random Forest (RF) to detect illicit transactions for AML. To the best of our knowledge, our proposal is the first of applying self-supervised GNNs to the problem of AML in Bitcoin. The proposed method has been evaluated on the Elliptic dataset and shows that our approach outperforms the baseline in terms of key classification metrics, which demonstrates the potential of self-supervised GNN in cryptocurrency illicit transaction detection.
    Meaningful machine learning models and machine-learned pharmacophores from fragment screening campaigns. (arXiv:2204.06348v1 [q-bio.BM])
    Machine learning (ML) is widely used in drug discovery to train models that predict protein-ligand binding. These models are of great value to medicinal chemists, in particular if they provide case-specific insight into the physical interactions that drive the binding process. In this study we derive ML models from over 50 fragment-screening campaigns to introduce two important elements that we believe are absent in most -- if not all -- ML studies of this type reported to date: First, alongside the observed hits we use to train our models, we incorporate true misses and show that these experimentally validated negative data are of significant importance to the quality of the derived models. Second, we provide a physically interpretable and verifiable representation of what the ML model considers important for successful binding. This representation is derived from a straightforward attribution procedure that explains the prediction in terms of the (inter-)action of chemical environments. Critically, we validate the attribution outcome on a large scale against prior annotations made independently by expert molecular modellers. We find good agreement between the key molecular substructures proposed by the ML model and those assigned manually, even when the model's performance in discriminating hits from misses is far from perfect. By projecting the attribution onto predefined interaction prototypes (pharmacophores), we show that ML allows us to formulate simple rules for what drives fragment binding against a target automatically from screening data.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v4 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. (arXiv:2204.06471v1 [cs.LG])
    In this paper we present a hybrid neural network augmented physics-based modeling (APBM) framework for Bayesian nonlinear latent space estimation. The proposed APBM strategy allows for model adaptation when new operation conditions come into play or the physics-based model is insufficient (or incomplete) to properly describe the latent phenomenon. One advantage of the APBMs and our estimation procedure is the capability of maintaining the physical interpretability of estimated states. Furthermore, we propose a constraint filtering approach to control the neural network contributions to the overall model. We also exploit assumed density filtering techniques and cubature integration rules to present a flexible estimation strategy that can easily deal with nonlinear models and high-dimensional latent spaces. Finally, we demonstrate the efficacy of our methodology by leveraging a target tracking scenario with nonlinear and incomplete measurement and acceleration models, respectively.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning. (arXiv:2204.06509v1 [cs.LG])
    When learning to act in a stochastic, partially observable environment, an intelligent agent should be prepared to anticipate a change in its belief of the environment state, and be capable of adapting its actions on-the-fly to changing conditions. As humans, we are able to form contingency plans when learning a task with the explicit aim of being able to correct errors in the initial control, and hence prove useful if ever there is a sudden change in our perception of the environment which requires immediate corrective action. This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount, and a strong ability to react to a changing belief about the environment is truly needed. In this paper we explore an end-to-end approach, from training to execution, for learning robust contingency plans and combining them with a hierarchical planner to obtain a robust agent policy in an autonomous navigation task where other vehicles' behaviours are unknown, and the agent's belief about these behaviours is subject to sudden, last-second change. We show that our approach results in robust, safe behaviour in a partially observable, stochastic environment, generalizing well over environment dynamics not seen during training.
    Massive MIMO Beam Management in Sub-6 GHz 5G NR. (arXiv:2204.06064v1 [eess.SP])
    Beam codebooks are a new feature of massive multiple-input multiple-output (M-MIMO) in 5G new radio (NR). Codebooks comprised of beamforming vectors are used to transmit reference signals and obtain limited channel state information (CSI) from receivers via the codeword index. This enables large arrays that cannot otherwise obtain sufficient CSI. The performance, however, is limited by the codebook design. In this paper, we show that machine learning can be used to train site-specific codebooks for initial access. We design a neural network based on an autoencoder architecture that uses a beamspace observation in combination with RF environment characteristics to improve the synchronization signal (SS) burst codebook. We test our algorithm using a flexible dataset of channels generated from QuaDRiGa. The results show that our model outperforms the industry standard (DFT beams) and approaches the optimal performance (perfect CSI and singular value decomposition (SVD)-based beamforming), using only a few bits of feedback.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Approximation of Lipschitz Functions using Deep Spline Neural Networks. (arXiv:2204.06233v1 [cs.LG])
    Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.
    Out-of-distribution Detection with Deep Nearest Neighbors. (arXiv:2204.06507v1 [cs.LG])
    Out-of-distribution (OOD) detection is a critical task for deploying machine learning models in the open world. Distance-based methods have demonstrated promise, where testing samples are detected as OOD if they are relatively far away from in-distribution (ID) data. However, prior methods impose a strong distributional assumption of the underlying feature space, which may not always hold. In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. Unlike prior works, our method does not impose any distributional assumption, hence providing stronger flexibility and generality. We demonstrate the effectiveness of nearest-neighbor-based OOD detection on several benchmarks and establish superior performance. Under the same model trained on ImageNet-1k, our method substantially reduces the false positive rate (FPR@TPR95) by 24.77% compared to a strong baseline SSD+, which uses a parametric approach Mahalanobis distance in detection.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. (arXiv:2201.07284v5 [cs.LG] UPDATED)
    Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines.
    ADASYN-Random Forest Based Intrusion Detection Model. (arXiv:2105.04301v5 [cs.CR] UPDATED)
    Intrusion detection has been a key topic in the field of cyber security, and the common network threats nowadays have the characteristics of varieties and variation. Considering the serious imbalance of intrusion detection datasets will result in low classification performance on attack behaviors of small sample size and difficulty to detect network attacks accurately and efficiently, using Adaptive Synthetic Sampling (ADASYN) method to balance datasets was proposed in this paper. In addition, Random Forest algorithm was used to train intrusion detection classifiers. Through the comparative experiment of Intrusion detection on CICIDS 2017 dataset, it is found that ADASYN with Random Forest performs better. Based on the experimental results, the improvement of precision, recall, F1 scores and AUC values after ADASYN is then analyzed. Experiments show that the proposed method can be applied to intrusion detection with large data, and can effectively improve the classification accuracy of network attack behaviors. Compared with traditional machine learning models, it has better performance, generalization ability and robustness.
    Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency. (arXiv:2112.06054v3 [cs.LG] UPDATED)
    Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.
    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation. (arXiv:2204.06517v1 [cs.IR])
    User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user's future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
    A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic. (arXiv:2204.06362v1 [cs.LG])
    The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.
    Local and global topological complexity measures OF ReLU neural network functions. (arXiv:2204.06062v1 [math.AT])
    We apply a generalized piecewise-linear (PL) version of Morse theory due to Grunert-Kuhnel-Rote to define and study new local and global notions of topological complexity for fully-connected feedforward ReLU neural network functions, F: R^n -> R. Along the way, we show how to construct, for each such F, a canonical polytopal complex K(F) and a deformation retract of the domain onto K(F), yielding a convenient compact model for performing calculations. We also give a combinatorial description of local complexity for depth 2 networks, and a construction showing that local complexity can be arbitrarily high.
    Do We Need Anisotropic Graph Neural Networks?. (arXiv:2104.01481v4 [cs.LG] UPDATED)
    Common wisdom in the graph neural network (GNN) community dictates that anisotropic models -- in which messages sent between nodes are a function of both the source and target node -- are required to achieve state-of-the-art performance. Benchmarks to date have demonstrated that these models perform better than comparable isotropic models -- where messages are a function of the source node only. In this work we provide empirical evidence challenging this narrative: we propose an isotropic GNN, which we call Efficient Graph Convolution (EGC), that consistently outperforms comparable anisotropic models, including the popular GAT or PNA architectures by using spatially-varying adaptive filters. In addition to raising important questions for the GNN community, our work has significant real-world implications for efficiency. EGC achieves higher model accuracy, with lower memory consumption and latency, along with characteristics suited to accelerator implementation, while being a drop-in replacement for existing architectures. As an isotropic model, it requires memory proportional to the number of vertices in the graph ($\mathcal{O}(V)$); in contrast, anisotropic models require memory proportional to the number of edges ($\mathcal{O}(E)$). We demonstrate that EGC outperforms existing approaches across 6 large and diverse benchmark datasets, and conclude by discussing questions that our work raise for the community going forward. Code and pretrained models for our experiments are provided at https://github.com/shyam196/egc.
    Slope stability predictions on spatially variable random fields using machine learning surrogate models. (arXiv:2204.06097v1 [cs.LG])
    Random field Monte Carlo (MC) reliability analysis is a robust stochastic method to determine the probability of failure. This method, however, requires a large number of numerical simulations demanding high computational costs. This paper explores the efficiency of different machine learning (ML) algorithms used as surrogate models trained on a limited number of random field slope stability simulations in predicting the results of large datasets. The MC data in this paper require only the examination of failure or non-failure, circumventing the time-consuming calculation of factors of safety. An extensive dataset is generated, consisting of 120,000 finite difference MC slope stability simulations incorporating different levels of soil heterogeneity and anisotropy. The Bagging Ensemble, Random Forest and Support Vector classifiers are found to be the superior models for this problem amongst 9 different models and ensemble classifiers. Trained only on 0.47% of data (500 samples), the ML model can classify the entire 120,000 samples with an accuracy of %85 and AUC score of %91. The performance of ML methods in classifying the random field slope stability results generally reduces with higher anisotropy and heterogeneity of soil. The ML assisted MC reliability analysis proves a robust stochastic method where errors in the predicted probability of failure using %5 of MC data is only %0.46 in average. The approach reduced the computational time from 306 days to less than 6 hours.
    Conditional Gradients for the Approximately Vanishing Ideal. (arXiv:2202.03349v6 [cs.LG] UPDATED)
    The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.
    Deep Annotation of Therapeutic Working Alliance in Psychotherapy. (arXiv:2204.05522v1 [q-bio.NC] CROSS LISTED)
    The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the psychotherapy sessions in a turn-level resolution with deep embeddings such as the Doc2Vec and SentenceBERT models. The transcript of each psychotherapy session can be transcribed and generated in real-time from the session speech recordings, and these embedded dialogues are compared with the distributed representations of the statements in the working alliance inventory. We demonstrate, in a real-world dataset with over 950 sessions of psychotherapy treatments in anxiety, depression, schizophrenia and suicidal patients, the effectiveness of this method in mapping out trajectories of patient-therapist alignment and the interpretability that can offer insights in clinical psychiatry. We believe such a framework can be provide timely feedback to the therapist regarding the quality of the conversation in interview sessions.
    Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs. (arXiv:2204.06255v1 [cs.LG])
    Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics. Neural Operators, generations of neural networks with capability of learning maps between infinite-dimensional spaces, are strong tools for solving parametric PDEs. However, they lack the ability to modeling SPDEs which usually have poor regularity due to the driving noise. As the theory of regularity structure has achieved great successes in analyzing SPDEs and provides the concept model feature vectors that well-approximate SPDEs' solutions, we propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs. We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d stochastic Navier-Stokes equation, and the results demonstrate that the NORS is resolution-invariant, efficient, and achieves one order of magnitude lower error with a modest amount of data.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Is Speech Pathology a Biomarker in Automatic Speaker Verification?. (arXiv:2204.06450v1 [cs.SD])
    With the advancements in deep learning (DL) and an increasing interest in data-driven speech processing methods, a major challenge for speech data scientists in the healthcare domain is the anonymization of pathological speech, which is a required step to be able to make them accessible as a public training resource. In this paper, we investigate pathological speech data and compare their speaker verifiability with that of healthy individuals. We utilize a large pathological speech corpus of more than 2,000 test subjects with various speech and voice disorders from different ages and apply DL-based automatic speaker verification (ASV) techniques. As a result, we obtained a mean equal error rate (EER) of 0.86% with a standard deviation of 0.16%, which is a factor of three lower than comparable healthy speech databases. We further perform detailed analyses of external influencing factors on ASV such as age, pathology, recording environment, and utterance length, to explore their respective effect. Our findings indicate that speech pathology is a potential biomarker in ASV. This is potentially of high interest for the anonymization of pathological speech data.
    Flexible Multiple-Objective Reinforcement Learning for Chip Placement. (arXiv:2204.06407v1 [cs.LG])
    Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changing requirements as they arise. This paper proposes flexible multiple-objective reinforcement learning (MORL) to support objective functions with inference-time variable weights using just a single pretrained model. Our macro placement results show that MORL can generate the Pareto frontier of multiple objectives effectively.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.  ( 2 min )
    CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU. (arXiv:2204.06240v1 [cs.LG])
    The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/zhengzangw/LargeBatchCTR.
    Large-scale multi-objective influence maximisation with network downscaling. (arXiv:2204.06250v1 [cs.SI])
    Finding the most influential nodes in a network is a computationally hard problem with several possible applications in various kinds of network-based problems. While several methods have been proposed for tackling the influence maximisation (IM) problem, their runtime typically scales poorly when the network size increases. Here, we propose an original method, based on network downscaling, that allows a multi-objective evolutionary algorithm (MOEA) to solve the IM problem on a reduced scale network, while preserving the relevant properties of the original network. The downscaled solution is then upscaled to the original network, using a mechanism based on centrality metrics such as PageRank. Our results on eight large networks (including two with $\sim$50k nodes) demonstrate the effectiveness of the proposed method with a more than 10-fold runtime gain compared to the time needed on the original network, and an up to $82\%$ time reduction compared to CELF.
    DT2CAM: A Decision Tree to Content Addressable Memory Framework. (arXiv:2204.06114v1 [cs.AR])
    Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact implementation and enables an efficient bijective mapping to Ternary Content Addressable Memories while maintaining high inference accuracies. In addition, a Resistive-CAM (ReCAM) functional synthesizer is developed for mapping the decision tree to the ReCAM and performing functional simulations for energy, latency, and accuracy evaluations. We study the decision tree accuracy under hardware non-idealities including device defects, manufacturing variability, and input encoding noise. We test our framework on various DT datasets including \textit{Give Me Some Credit}, \textit{Titanic}, and \textit{COVID-19}. Our results reveal up to {42.4\%} energy savings and up to 17.8x better energy-delay-area product compared to the state-of-art hardware accelerators, and up to 333 million decisions per sec for the pipelined implementation.
    Learnable Hypergraph Laplacian for Hypergraph Learning. (arXiv:2106.06666v2 [cs.LG] UPDATED)
    Hypergraph Convolutional Neural Networks (HGCNNs) have demonstrated their potential in modeling high-order relations preserved in graph-structured data. However, most existing convolution filters are localized and determined by the pre-defined initial hypergraph topology, neglecting to explore implicit and long-range relations in real-world data. In this paper, we propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD), which serves as a generic plug-and-play module for improving the representational power of HGCNNs.Specifically, HERALD adaptively optimizes the adjacency relationship between vertices and hyperedges in an end-to-end manner and thus the task-aware hypergraph is learned. Furthermore, HERALD employs the self-attention mechanism to capture the non-local paired-nodes relation. Extensive experiments on various popular hypergraph datasets for node classification and graph classification tasks demonstrate that our approach obtains consistent and considerable performance enhancement, proving its effectiveness and generalization ability.
    InCoder: A Generative Model for Code Infilling and Synthesis. (arXiv:2204.05999v1 [cs.SE])
    Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
    An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks. (arXiv:2201.11440v2 [cs.CV] UPDATED)
    Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated significant performance gain close to Stacking, which resulted in an F1-score increase up to +11%. Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.
    Decentralized Collaborative Learning Framework for Next POI Recommendation. (arXiv:2204.06516v1 [cs.IR])
    Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy.  ( 2 min )
    Negative Sampling for Recommendation. (arXiv:2204.06520v1 [cs.IR])
    How to effectively sample high-quality negative instances is important for well training a recommendation model. We argue that a high-quality negative should be both \textit{informativeness} and \textit{unbiasedness}. Although previous studies have proposed some approaches to address the informativeness in negative sampling, few has been done to discriminating false negative from true negative for unbiased negative sampling, not to mention taking both into consideration. This paper first adopts a parameter learning perspective to analyze negative informativeness and unbiasedness in loss gradient-based model training. We argue that both negative sampling and collaborative filtering include an implicit task of negative classification, from which we report an insightful yet beneficial finding about the order relation in predicted negatives' scores. Based on our finding and by regarding negatives as random variables, we next derive the class condition density of true negatives and that of false negatives. We also design a Bayesian classifier for negative classification, from which we define a quantitative unbiasedness measure for negatives. Finally, we propose to use a harmonic mean of informativeness and unbiasedness to sample high-quality negatives. Experimental studies validate the superiority of our negative sampling algorithm over the peers in terms of better sampling quality and better recommendation performance.  ( 2 min )
    VisCUIT: Visual Auditor for Bias in CNN Image Classifier. (arXiv:2204.05899v2 [cs.CV] UPDATED)
    CNN image classifiers are widely used, thanks to their efficiency and accuracy. However, they can suffer from biases that impede their practical applications. Most existing bias investigation techniques are either inapplicable to general image classification tasks or require significant user efforts in perusing all data subgroups to manually specify which data attributes to inspect. We present VisCUIT, an interactive visualization system that reveals how and why a CNN classifier is biased. VisCUIT visually summarizes the subgroups on which the classifier underperforms and helps users discover and characterize the cause of the underperformances by revealing image concepts responsible for activating neurons that contribute to misclassifications. VisCUIT runs in modern browsers and is open-source, allowing people to easily access and extend the tool to other model architectures and datasets. VisCUIT is available at the following public demo link: https://poloclub.github.io/VisCUIT. A video demo is available at https://youtu.be/eNDbSyM4R_4.  ( 2 min )
    Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach. (arXiv:2204.06390v1 [cs.IT])
    Coverage and capacity are the important metrics for performance evaluation in wireless networks, while the coverage and capacity have several conflicting relationships, e.g. high transmit power contributes to large coverage but high inter-cell interference reduces the capacity performance. Therefore, in order to strike a balance between the coverage and capacity, a novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. To solve the coverage and capacity optimization (CCO) problem, a machine learning-based multi-objective optimization algorithm, i.e., the multi-objective proximal policy optimization (MO-PPO) algorithm, is proposed. In this algorithm, a loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.  ( 2 min )
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.  ( 2 min )
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.  ( 2 min )
    Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification. (arXiv:2204.06305v1 [cs.CL])
    Prompt-based learning (i.e., prompting) is an emerging paradigm for exploiting knowledge learned by a pretrained language model. In this paper, we propose Automatic Multi-Label Prompting (AMuLaP), a simple yet effective method to automatically select label mappings for few-shot text classification with prompting. Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template. Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.  ( 2 min )
    Active Diffusion and VCA-Assisted Image Segmentation of Hyperspectral Images. (arXiv:2204.06298v1 [cs.CV])
    Hyperspectral images encode rich structure that can be exploited for material discrimination by machine learning algorithms. This article introduces the Active Diffusion and VCA-Assisted Image Segmentation (ADVIS) for active material discrimination. ADVIS selects high-purity, high-density pixels that are far in diffusion distance (a data-dependent metric) from other high-purity, high-density pixels in the hyperspectral image. The ground truth labels of these pixels are queried and propagated to the rest of the image. The ADVIS active learning algorithm is shown to strongly outperform its fully unsupervised clustering algorithm counterpart, suggesting that the incorporation of a very small number of carefully-selected ground truth labels can result in substantially superior material discrimination in hyperspectral images.  ( 2 min )
    Receding Neuron Importances for Structured Pruning. (arXiv:2204.06404v1 [cs.LG])
    Structured pruning efficiently compresses networks by identifying and removing unimportant neurons. While this can be elegantly achieved by applying sparsity-inducing regularisation on BatchNorm parameters, an L1 penalty would shrink all scaling factors rather than just those of superfluous neurons. To tackle this issue, we introduce a simple BatchNorm variation with bounded scaling parameters, based on which we design a novel regularisation term that suppresses only neurons with low importance. Under our method, the weights of unnecessary neurons effectively recede, producing a polarised bimodal distribution of importances. We show that neural networks trained this way can be pruned to a larger extent and with less deterioration. We one-shot prune VGG and ResNet architectures at different ratios on CIFAR and ImagenNet datasets. In the case of VGG-style networks, our method significantly outperforms existing approaches particularly under a severe pruning regime.  ( 2 min )
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.  ( 2 min )
    L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models. (arXiv:2204.06029v1 [cs.CL])
    Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people of Maharashtra state. Marathi is a low resource language and still lacks useful NER resources. We present L3Cube-MahaNER, the first major gold standard named entity recognition dataset in Marathi. We also describe the manual annotation guidelines followed during the process. In the end, we benchmark the dataset on different CNN, LSTM, and Transformer based models like mBERT, XLM-RoBERTa, IndicBERT, MahaBERT, etc. The MahaBERT provides the best performance among all the models. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .
    Prediction of motor insurance claims occurrence as an imbalanced machine learning problem. (arXiv:2204.06109v1 [q-fin.ST])
    The insurance industry, with its large datasets, is a natural place to use big data solutions. However it must be stressed, that significant number of applications for machine learning in insurance industry, like fraud detection or claim prediction, deals with the problem of machine learning on an imbalanced data set. This is due to the fact that frauds or claims are rare events when compared with the entire population of drivers. The problem of imbalanced learning is often hard to overcome. Therefore, the main goal of this work is to present and apply various methods of dealing with an imbalanced dataset in the context of claim occurrence prediction in car insurance. In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance. Our study covers the following techniques: logistic-regression, decision tree, random forest, xgBoost, feed-forward network. The problem is the classification one.
    Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. (arXiv:2204.06214v1 [cs.CV])
    Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.
    Deep Learning for Effective and Efficient Reduction of Large Adaptation Spaces in Self-Adaptive Systems. (arXiv:2204.06254v1 [cs.SE])
    Many software systems today face uncertain operating conditions, such as sudden changes in the availability of resources or unexpected user behavior. Without proper mitigation these uncertainties can jeopardize the system goals. Self-adaptation is a common approach to tackle such uncertainties. When the system goals may be compromised, the self-adaptive system has to select the best adaptation option to reconfigure by analyzing the possible adaptation options, i.e., the adaptation space. Yet, analyzing large adaptation spaces using rigorous methods can be resource- and time-consuming, or even be infeasible. One approach to tackle this problem is by using online machine learning to reduce adaptation spaces. However, existing approaches require domain expertise to perform feature engineering to define the learner, and support online adaptation space reduction only for specific goals. To tackle these limitations, we present 'Deep Learning for Adaptation Space Reduction Plus' -- DLASeR+ in short. DLASeR+ offers an extendable learning framework for online adaptation space reduction that does not require feature engineering, while supporting three common types of adaptation goals: threshold, optimization, and set-point goals. We evaluate DLASeR+ on two instances of an Internet-of-Things application with increasing sizes of adaptation spaces for different combinations of adaptation goals. We compare DLASeR+ with a baseline that applies exhaustive analysis and two state-of-the-art approaches for adaptation space reduction that rely on learning. Results show that DLASeR+ is effective with a negligible effect on the realization of the adaptation goals compared to an exhaustive analysis approach, and supports three common types of adaptation goals beyond the state-of-the-art approaches.
    Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective. (arXiv:2204.06251v1 [cs.LG])
    The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, as with other fields employing DL techniques, there has been a lack of common experimental standards compared to more established disciplines. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in DL into a single, widely-applicable methodology. Following these best practices is crucial to strengthening experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.
  • Open

    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Features of the Earth's seasonal hydroclimate: Characterizations and comparisons across the Koppen-Geiger climates and across continents. (arXiv:2204.06544v1 [stat.AP])
    Detailed feature investigations and comparisons across climates, continents and time series types can progress our understanding and modelling ability of the Earth's hydroclimate and its dynamics. As a step towards these important directions, we here propose and extensively apply a multifaceted and engineering-friendly methodological framework for the thorough characterization of seasonal hydroclimatic dependence, variability and change at the global scale. We apply this framework using over 13 000 quarterly temperature, precipitation and river flow time series. In these time series, the seasonal hydroclimatic behaviour is represented by 3-month means of earth-observed variables. In our analyses, we also adopt the well-established Koppen-Geiger climate classification system and define continental-scale regions with large or medium density of observational stations. In this context, we provide in parallel seasonal hydroclimatic feature summaries and comparisons in terms of autocorrelation, seasonality, temporal variation, entropy, long-range dependence and trends. We find notable differences to characterize the magnitudes of most of these features across the various Koppen-Geiger climate classes, as well as between several continental-scale geographical regions. We, therefore, deem that the consideration of the comparative summaries could be more beneficial in water resources engineering contexts than the also provided global summaries. Lastly, we apply explainable machine learning to compare the investigated features with respect to how informative they are in explaining and predicting either the main Koppen-Geiger climate or the continental-scale region, with the entropy, long-range dependence and trend features being (roughly) found to be less informative than the remaining ones at the seasonal time scale.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.
    The sparse Polynomial Chaos expansion: a fully Bayesian approach with joint priors on the coefficients and global selection of terms. (arXiv:2204.06043v1 [stat.CO])
    Polynomial chaos expansion (PCE) is a versatile tool widely used in uncertainty quantification and machine learning, but its successful application depends strongly on the accuracy and reliability of the resulting PCE-based response surface. High accuracy typically requires high polynomial degrees, demanding many training points especially in high-dimensional problems through the curse of dimensionality. So-called sparse PCE concepts work with a much smaller selection of basis polynomials compared to conventional PCE approaches and can overcome the curse of dimensionality very efficiently, but have to pay specific attention to their strategies of choosing training points. Furthermore, the approximation error resembles an uncertainty that most existing PCE-based methods do not estimate. In this study, we develop and evaluate a fully Bayesian approach to establish the PCE representation via joint shrinkage priors and Markov chain Monte Carlo. The suggested Bayesian PCE model directly aims to solve the two challenges named above: achieving a sparse PCE representation and estimating uncertainty of the PCE itself. The embedded Bayesian regularizing via the joint shrinkage prior allows using higher polynomial degrees for given training points due to its ability to handle underdetermined situations, where the number of considered PCE coefficients could be much larger than the number of available training points. We also explore multiple variable selection methods to construct sparse PCE expansions based on the established Bayesian representations, while globally selecting the most meaningful orthonormal polynomials given the available training data. We demonstrate the advantages of our Bayesian PCE and the corresponding sparsity-inducing methods on several benchmarks.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.
    Approximate Bayesian Computation via Classification. (arXiv:2111.11507v3 [stat.ME] UPDATED)
    Approximate Bayesian Computation (ABC) enables statistical inference in simulator-based models whose likelihoods are difficult to calculate but easy to simulate from. ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data. To obviate the need for summary statistics, we directly compare empirical distributions with a Kullback-Leibler (KL) divergence estimator obtained via contrastive learning. In particular, we blend flexible machine learning classifiers within ABC to automate fake/real data comparisons. We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold. Our theoretical results show that the rate at which our ABC posterior distributions concentrate around the true parameter depends on the estimation error of the classifier. We derive limiting posterior shape results and find that, with a properly scaled exponential kernel, asymptotic normality holds. We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    Utilizing variational autoencoders in the Bayesian inverse problem of photoacoustic tomography. (arXiv:2204.06270v1 [physics.comp-ph])
    There has been an increasing interest in utilizing machine learning methods in inverse problems and imaging. Most of the work has, however, concentrated on image reconstruction problems, and the number of studies regarding the full solution of the inverse problem is limited. In this work, we study a machine learning based approach for the Bayesian inverse problem of photoacoustic tomography. We develop an approach for estimating the posterior distribution in photoacoustic tomography using an approach based on the variational autoencoder. The approach is evaluated with numerical simulations and compared to the solution of the inverse problem using a Bayesian approach.
    Time-uniform central limit theory with applications to anytime-valid causal inference. (arXiv:2103.06476v3 [math.ST] UPDATED)
    This work introduces time-uniform analogues of confidence intervals based on the central limit theorem (CLT). Our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, requiring strong assumptions on the data, while the classical (fixed-time) CLT is ubiquitous due to the weak assumptions it imposes. Our work bridges the gap by introducing time-uniform CSs that only require CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles like the seminal work of Koml\'os, Major, and Tusn\'ady to uniformly approximate the entire sample average process by an implicit Brownian motion. Applying Robbins' normal mixture martingale method to this Brownian motion then yields closed-form time-uniform boundaries. We combine these boundaries with doubly robust estimators to derive nonparametric CSs for the average treatment effect (and other causal estimands). These allow randomized experiments and observational studies to be continuously monitored and adaptively stopped, all while controlling the type-I error.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    Approximating Continuous Functions on Persistence Diagrams Using Template Functions. (arXiv:1902.07190v3 [cs.CG] UPDATED)
    The persistence diagram is an increasingly useful tool from Topological Data Analysis, but its use alongside typical machine learning techniques requires mathematical finesse. The most success to date has come from methods that map persistence diagrams into vector spaces, in a way which maximizes the structure preserved. This process is commonly referred to as featurization. In this paper, we describe a mathematical framework for featurization called \emph{template functions}, and we show that it addresses the problem of approximating continuous functions on compact subsets of the space of persistence diagrams. Specifically, we begin by characterizing relative compactness with respect to the bottleneck distance, and then provide explicit theoretical methods for constructing compact-open dense subsets of continuous functions on persistence diagrams. These dense subsets -- obtained via template functions -- are leveraged for supervised learning tasks with persistence diagrams. Specifically, we test the method for classification and regression algorithms on several examples including shape data and dynamical systems.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    Time series features for supporting hydrometeorological explorations and predictions in ungauged locations using large datasets. (arXiv:2204.06540v1 [stat.ME])
    Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that largely emerge from general-purpose time series features for data science and, more precisely, from a large variety of such features. We focused on 28 features that included (partial) autocorrelation, entropy, temporal variation, seasonality, trend, lumpiness, stability, nonlinearity, linearity, spikiness, curvature and others. We estimated these features for daily temperature, precipitation and streamflow time series from 511 catchments, and then merged them within regionalization contexts with traditional topographic, land cover, soil and geologic attributes. Precipitation and temperature features (e.g., the spectral entropy, seasonality strength and lag-1 autocorrelation of the precipitation time series, and the stability and trend strength of the temperature time series) were found to be useful predictors of many streamflow features. The same applies to traditional attributes, such as the catchment mean elevation. Relationships between predictor and dependent variables were also revealed, while the spectral entropy, the seasonality strength and several autocorrelation features of the streamflow time series were found to be more regionalizable than others.
    SRMD: Sparse Random Mode Decomposition. (arXiv:2204.06108v1 [eess.SP])
    Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the spectrogram leads to a sharp separation between time-frequency clusters which makes it easier to identify intrinsic modes, and thus leads to a new data-driven mode decomposition. The applications include signal representation, outlier removal, and mode decomposition. On the benchmark tests, we show that our approach outperforms other state-of-the-art decomposition methods.
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
  • Open

    How flat is a normal mixture on top?
    Male and female heights both have a standard deviation of about 3 inches, with means of 70 inches and 64 inches. That’s a good first-pass model using round numbers. If you ask what the height of an average adult is, not specifying male or female, you get a mixture of two normal distributions. If we […] How flat is a normal mixture on top? first appeared on John D. Cook.  ( 2 min )

  • Open

    Best sample text for voice synthesis? [D]
    I'm planning to create a clone of my own voice. Is there some kind of ideal sample text to record? I need 300 sentences. submitted by /u/headwar [link] [comments]
    [R] ML model to generate paths (lines) for a given image
    I am a researcher working on creating paths to indicate the primary and secondary neuronal connections in Corneal Confocal Microscopy images. The ground truth I have is images and sets of two-dimensional lines that indicate the primary and secondary paths as indicated in the image below (Ignore the green dots). The secondary and primary paths are always connected to each other. ​ https://preview.redd.it/r025qcpjddt81.jpg?width=834&format=pjpg&auto=webp&s=11a641988e0c6f8c1e17290fe9588f89ef530635 I am looking to find the most appropriate models to use for this task. The first thing that came to my mind was semantic segmentation. However, I am looking for other approaches that can be more suitable for drawing 1-pixel lines especially since the ground truth paths are indicated as 1-pixel-wide lines (1-pixel thickness) but the connections in the images have wider thicknesses. Any ideas for architecture or methods? submitted by /u/madr3z [link] [comments]  ( 2 min )
    [N] Followup response from BAAI on "A Roadmap for Big Model"
    Source: https://www.baai.ac.cn/portal/article/index/cid/4/id/404.html Statement on the Alleged Plagiarism by “A Roadmap for Big Model” It has come to our attention that the survey report “A Roadmap for Big Model” uploaded on arXiv by a BAAI team is suspected of plagiarism. Immediately upon learning of the allegations, an internal investigation was organized to confirm the issue. BAAI is also initiating an independent review by third-party experts to further asses the issue and accountabilities. As a research institution that attaches great importance to academic standards, BAAI holds a zero-tolerance policy towards academic misconduct. We express our sincerest apologies to the authors of the original papers and to all of those affected. The report in question constitutes a collection …  ( 2 min )
    [P] Image Restoration Using Swin Transformer in JavaScript
    Important note: Right now, the model only supports up sampling from any dimension to at most 256 pixels. I'll likely fix this restriction in the next few days. A few days back, I was searching for AI-based image up sampling models in for use within an offline JavaScript app. The latest approaches, such as SwinIR were unavailable for Javascript, so I just created a notebook that converts the SwinIR model from torch to TFJS in a relatively short kaggle kernel. I believe other transformer architectures can also be ported to JS like this. This is the link to the original paper of SwinIR. It requires around 1 GB of RAM to run. The size of model folder is 44 MB. It is quantized to float16. Anyway, hope someone will find this useful for their website or some other app. submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    [D] Replacing 3x3 convolutions with two 2x2 convolutions
    Something that's always puzzled me is the ubiquitousness of 3x3 convolutions in computer vision. If I recall past discussion accurately, the main benefits of odd-sized kernels are that With the proper padding they maintain the width and height of their inputs, which makes it easier to think about/design neural network architectures. This is not possible with even kernels, unless you swallow the bullet and use asymmetric padding (which is rejected due to aesthetic reasons) Output pixels have a 1-to-1 mapping with input pixels (since odd-sized kernels have a proper "center"). This is considered a nice property -- perhaps (for instance) avoiding aliasing issues during segmentation tasks. Given these two points, we use 3x3 convolutions since they're the smallest odd-sized filter (exclud…  ( 6 min )
    [R] Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection ?
    submitted by /u/MVTS_Ano [link] [comments]  ( 1 min )
    [D] Open problem in modern RL that doesn't need a massive computational resources
    What are open and/or interesting problems in modern reinforcement learning that can be tackled by the average PhD/PostDoc who doesn't have access to a massive compute cluster? The problem shouldn't need us to train our model for 10 months like OpenAI's Dota2 model. Please share your thoughts. submitted by /u/ginger_beer_m [link] [comments]  ( 1 min )
    [D] What are the biggest developments in CV in last 5 years?
    I've been helping a friend of mine learn about CV, but my knowledge starts getting spotty around 2017-2018. In this spirit I'm hoping to discuss the biggest developments in CV in the last 5 years. I know that ViTs have been developed in that time but I'm hoping to fill in my knowledge gaps! A list of topics, important papers, big ideas, or anything else is much appreciated! ​ Edit: Thank you for the discussion everyone 🙌! I've been in and out of meetings but am reading through all responses now submitted by /u/SleekEagle [link] [comments]  ( 3 min )
    [P] How and where do you serve your model? Using kubernetes, docker, metal? Self developed or existing tools?
    Hi, I’m a machine learning platform engineer. I’ve been using, exploring and developing model deployment tools and platform for several years. Very often, I found that many of the tools or managed service of AI platform, are not very welcome by many users. Some think these tools are unnecessarily complicated. I'm currently developing a library in my free time trying to fill the gap. And I also want the library to get well integrated with most users' deployment environments. Would you like to share how and where do you serve your model? Using kubernetes? Self developed or existing tools? Thanks~ P.S. If you are interested, you can visit my project to submit an issue/PR or join the discussions, welcome to help: Pinferencia View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 1 min )
    [P] I created a YouTube Thumbnail Dataset, and need some insight
    Hi guys! I recently created & published a dataset of YouTube video thumbnails on Kaggle (YouTube Thumbnail Dataset), I've tried to make the dataset as diverse as possible, It contains thumbnails from all varieties of YouTube channels. This dataset goes hand in hand with another dataset (containing YouTube video annotations) that I created, namely YouTubers Saying Things. The dataset contains 91 unique YouTube channels, and 10 categories, these categories are assigned by me manually to these channels. (Comedy, Science, Automobile, VideoGames, Food, Entertainment, Informative, Blog, News, Tech) All kinds of feedback and criticism are welcome, and also if you guys want some particular channel to be included in both these datasets, feel free to comment on this post, or raise an issue on the Github repositories for both these datasets, I will surely add them in the next version. Links to the datasets: YouTubers Saying Things Kaggle, Github YouTube Thumbnail Dataset Kaggle, Github submitted by /u/alcatraz2217 [link] [comments]  ( 1 min )
    Improve XGboost classification algorithm with small dataset, based on similar bigger dataset ? [D]
    Hi, I am doing researches about transfer learning for XGboost. I am currently working with a small dataset from a company in Spain (short history) and the scoring is poor. I have worked before with the same company in France and the scoring was great as I had plenty of data thanks to a big history. How could I improve my score with the data from Spain with the help of data from France ? Could I use transfer learning, or data mutualization, or data augmentation ? If anyone has faced before a similar problem, or has read some papers about it, I would love to hear about it. Thank you! submitted by /u/Cutset [link] [comments]  ( 1 min )
    [R] A Modern Self-Referential Weight Matrix That Learns to Modify Itself
    submitted by /u/hardmaru [link] [comments]
    [D] What are some interesting hidden stuff about CNNs?
    Hey all, Im trying to get up to date with Deep Learning literature, so the last week I was going through CNNs. Here's a general view of what Ive learned so far. Large filters suck, you can get better accuracy with smaller filters and more non linearities Depth is the most important for CNNs over width or filter sizes. ReLU activation is generally better, as Sigmoid/tanh gradients tend to fall off towards the ends Convolution layers are only translation invariant. Stacking multiple features together and passing them through MaxPool helps rotational invariance and scaling although not completely Residual connection help address vanishing gradient and help improve the overall training procedure Inception models worked well, as they mixed different filter sizes together helping the model learn diverse features Most current work is with Transformers, although Im not sure why. ConvNext shows similar performance can be achieved through large CNNs Do add to this if I missed anything, or if there's anything you don't know about submitted by /u/Bibbidi_Babbidi_Boo [link] [comments]  ( 5 min )
    [D] How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed
    I’m sure we all remember the story of “The Little Engine That Could.” A little railroad engine was built for pulling a few cars on and off the switches. When more powerful engines are asked to pull a load over a steep hill, they respond “I can’t; that is too much a pull for me”.… Read More »Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed The post Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed appeared first on Data Science Central.  ( 6 min )
    DSC Weekly Digest 4/12/2022: Demographics Drives Analytics
    The Los Angeles Times recently reported on a growing problem not just for California School Districts, but across much of the Northern Hemisphere: The number of children entering school has been dropping steadily for five years now, and is changing the dynamics of education. What’s worse, those declines are accelerating. Sometimes understanding the future comes… Read More »DSC Weekly Digest 4/12/2022: Demographics Drives Analytics The post DSC Weekly Digest 4/12/2022: Demographics Drives Analytics appeared first on Data Science Central.  ( 6 min )
  • Open

    Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation
    You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can reuse features stored in the central repository, avoiding the need to reengineer feature pipelines for different projects and […]  ( 10 min )
    Manage dialog to elicit Amazon Lex slots in Amazon Connect contact flows
    Amazon Lex can add powerful automation to contact center solutions, so you can enable self-service via interactive voice response (IVR) interactions or route calls to the appropriate agent based on caller input. These capabilities can increase customer satisfaction by streamlining the user experience, and improve containment rates in the contact center. In both the self-service […]  ( 6 min )
  • Open

    AI Trippy Dream 35 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Ohio State University Researchers Develop SAT2LoD2: An Open-Source Python Tool For 3D Landscape Modelling Using Satelite Imagery
    3D landscape modeling has seen a rise in its popularity and applications in recent years. It has countless applications in the fields of civil engineering, earth sciences, military applications, and many others. Geometric 3D models are typically developed using the city geography markup language (CityGML), and the Level-of-Detail (LoD) building model is the preferred model for building 3D models using CityGML. The use of Satellite imagery for landscape modeling provides the advantage of covering a wide area and is low cost. However, developing LoD2 models using satellite imagery remains a big challenge. Building models in such a way involves complex steps demanding heuristics-based approaches and ML-based detection paradigms. In a recent paper, researchers at the Ohio State University propose a SAT2LoD2 to facilitate the development of 3D landscape models. SAT2LoD2 is an open-source, python-based GUI-enabled software that takes the satellite images as inputs and returns LoD2 building models as outputs. The software also has the feature of taking road networks and custom maps as additional inputs for better results. Continue Reading Paper: https://arxiv.org/pdf/2204.04139v1.pdf Github: https://github.com/gdaosu/lod2buildingmodel submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there AIs which are able to simulate a human body when you shoot/hit it, that you can use for video games?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Digital Folktales, a collection of short stories about internet folklore, written and illustrated by Artificial Intelligence
    submitted by /u/fabianmosele [link] [comments]  ( 1 min )
    https://youtu.be/0x0to1wNh6s
    https://youtu.be/0x0to1wNh6s A new enterprise project model supported by AI. In the near future, the growing introduction of automation and artificial intelligence will require the updating of most of the activities in the production world, along with changes to contracts, tasks, and integration processes between man and machine. This is supported by the Accenture "IT's Learning" study, according to which 81% of jobs will suffer the impact of AI and robotization. submitted by /u/neologos52 [link] [comments]  ( 1 min )
    How to improve your video editing software with AI?
    submitted by /u/tah_zem [link] [comments]
    Top Ethical Challenges in AI – The Price of Progress
    What does 2022 look like for AI? Let's find out. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Bias in Artificial Intelligence: Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Wrote about KNN — Introduction to DataScience Book
    submitted by /u/mindaslab [link] [comments]
    How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Measuring Goodhart’s Law
    Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult or costly to measure.  ( 4 min )
  • Open

    MIT Schwarzman College of Computing unveils Break Through Tech AI
    New program strives to bridge the talent gap for underrepresented groups in the tech industry.  ( 5 min )
    Engineers enlist AI to help scale up advanced solar cell manufacturing
    Perovskite materials would be superior to silicon in PV cells, but manufacturing such cells at scale is a huge hurdle. Machine learning can help.  ( 7 min )
  • Open

    Simple and Effective Zero-Shot Task-Oriented Dialogue
    Posted by Jeffrey Zhao and Raghav Gupta, Software Engineers, Google Research Modern conversational agents need to integrate with an ever-increasing number of services to perform a wide variety of tasks, from booking flights and finding restaurants, to playing music and telling jokes. Adding this functionality can be difficult — for each new task, one needs to collect new data and retrain the models that power the conversational agent. This is because most task-oriented dialogue (TOD) models are trained on a single task-specific ontology. An ontology is generally represented as a list of possible user intents (e.g., if the user wants to book a flight, if the user wants to play some music, etc.) and possible parameter slots to extract from the conversation (e.g., the date of the flight, the…  ( 8 min )
  • Open

    Hilbert transform and Fourier series
    A few days ago I wrote about the Hilbert transform and gave as an example that the Hilbert transform of sine is cosine. We’ll bootstrap that example to find the Hilbert transform of any periodic function from its Fourier series. The Hilbert transform of a function f(t) is a function fH(x) defined by where the […] Hilbert transform and Fourier series first appeared on John D. Cook.  ( 2 min )
    Logarithms yearning to be free
    I got an evaluation copy of The Best Writing on Mathematics 2021 yesterday. One article jumped out as I was skimming the table of contents: A Zeroth Power Is Often a Logarithm Yearning to Be Free by Sanjoy Mahajan. Great title. There are quite a few theorems involving powers that have an exceptional case that […] Logarithms yearning to be free first appeared on John D. Cook.  ( 2 min )
  • Open

    Is the number of dimensions in the latent space equal to the number of the neurons of the layer? Or perhaps number of neurons in the whole neural network?
    ​ https://preview.redd.it/qtgp7dzx2bt81.png?width=850&format=png&auto=webp&s=ccf391e8a1613d5405c137296bdf853010fc3f19 Not speaking specifically about autoencoders here, but about general neural networks. As I understand correctly, "latent space" refers to one of the fully connected layers of the network and the dimensionality of the space is equal to the number of the neurons in this layer. This would mean, that each of the layers has a different "latent space" representation of the learned data distribution. Do I understand it correctly? I got really confused because people seem to sometimes refer to latent space as to all of the possible activations of all of the neurons in the network (each neuron of the network is one dimension of a latent space) OR EVEN to all of the PARAMETERS of the network (each parameter is one dimension of the latent space (??)). Do we have a separate name for these? How do we call the parameter space of a neural network? Is my original intuition even correct? submitted by /u/bzqp2 [link] [comments]  ( 1 min )
    Researchers Propose a Novel Framework ‘LilNetX’ For Training Deep Neural Network With Extreme Model Compression, and Structured Sparsification
    In this research, the researchers from the paper ‘ LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification’ talk about the importance of larger parameter-heavy and computationally costly architectures in deep neural networks (DNNs) and how it improves the computer vision tasks. They also mentioned in the paper that it is not as simple as it seems since, as the DNNs become more common in the business, they are frequently required to be trained multiple times, communicated across the network to various devices, and executed under hardware limits with minimum loss of accuracy, all while maintaining accuracy. Then the question arises of how to reduce the models’ size on the devices while still enhancing their run-time. Explorations in this field have tended to take one of two paths: lowering model size via compression approaches or reducing computing demands through model pruning. The main achievement of this research from the University of Maryland and Google Research is the introduction of ‘LilNetX’, an end-to-end trainable neural network technique that allows learning models with specified accuracy-rate-computation trade-offs. Prior work has taken a piecemeal approach to these difficulties, which necessitates post-processing or multistage training, which is not efficient and does not scale well for big datasets or architectures. To encourage modest model size, the strategy is to create a joint training goal that penalizes the self-information of network parameters in a reparameterized latent space while simultaneously incorporating priors to increase structured sparsity in the parameter space to decrease computation. Continue Reading Paper: https://arxiv.org/pdf/2204.02965.pdf Github: https://github.com/Sharath-girish/LilNetX submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    noob here who doesn’t really understand calculus
    If i want to take the partial derivative of the error with respect to a certain weight, it would be similar to taking the derivative of say y = value * weight + bias but if i hold the value and bias still, the derivative just becomes the value of the weight, like how the derivative of y = 3x is just 3… so what do I do? it doesn’t make sense to multiply 3 by a learning variable and make that the new weight, so what am I missing? submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
  • Open

    Does the reward in reinforcement learning have to be immediate reward?
    I'm trying to train a seq2seq model that generates a sentence with T words using reinforcement learning. The input and all the previously generated words form the state of the environment, and generating a word in the sentence is considered an action. In the previous methods [1, 2], the immediate reward r(s_t, a_t, s_{t+1}) for the t-th action a_t is 0 when t < T, and the reward is the CIDEr score (a scalar that measures the quality of the sentence) of the entire sentence when t = T. The policy is updated after the entire sentence is generated. I designed a new reward for each action, and the new reward for the t-th action is not zero when t < T. However, the reward of each action can only be calculated when the entire sentence is generated since it relies on the CIDEr score of the entire sentence, i.e. the reward for all the actions relies on the final state s_T. Can I still define the reward in the form of r(s_t, a_t, s_{t+1}) ? [1] Rennie et al. Self-critical sequence training for image captioning, CVPR 2017: 7008-7024. [2] Ranzato et al. Sequence level training with recurrent neural networks, ICLR 2016. submitted by /u/entalent [link] [comments]  ( 1 min )
    Question about math
    I am reading that paper A Distributional Perspective on Reinforcement Learning, and it is related to measure theory. Is it worth to spend time to study whole real analysis and measure theory? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    What does an oscillating explained_variance signify during training? (PPO)
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    SB3- HER+DQN for my simple discrete map env but the training result is pretty bad
    Hi all, I am creating a multiple-goal environment. Which is an 8*8 discrete map with a start and terminal state (only one) change after each episode. The reward is 100 for reaching the terminal state and -1 for the rest. In fact, I am not sure if the reward is reasonable. I used PPO from SB3 and I can easily finish it. But when I go offline, using HER+DQN, the training is very bad. Feel free to run it here or take a look at the env and training result. Thank you so much! https://colab.research.google.com/drive/1Mt5Yje7GTyjOBHL09zC9C1L05xpTAK9v?usp=sharing submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    Number of Feature VS Action Space in Multi-agent Reinforcement Learning
    Hi All, I am working on a MARL fintech project where I use DDQN and for Q-value, I use LSTM bacause it is time series data. This is a project overview. It has 7 features which is derivatives of ask and bid price and has 12 action spaces for action taking. Is it possible to generate a good reliable model using only 7 features for 12 action spaces? Number of feature or quality of feature is important for taking good decision in RL. Open for Suggestion #Reinforcement_Learning #MARL submitted by /u/laxuu [link] [comments]  ( 1 min )
    Is there anyone interested in re-implementing APT?
    Hi, these day I really interested in self-supervised RL. Especially only based on state novelty. So I wanted to re-implement APT(Behavior From the Void: Unsupervised Active Pre-Training). but my re-implementation showed not meaningful behaviors compared to official implementation. official implementation uses drq-v2 and intrinsic curiosity module. So, I want to re-implement APT as described in paper(using drq-v1 and contrastive learning). Is there someone to check my reimplementation? https://github.com/seolhokim/apt In that repository, DrQ-v1 works well, but only apt doesn't work! I can't understand why agent stop moving when pre-training. ​ Really thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    The Voice Synthesis Business: 2022 Update
    In the past few years, high-quality automated text-to-speech synthesis has effectively become a commodity, with easy access to cloud-based… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 15 min )
  • Open

    Massaging Data using Pandas
    When we talk about managing data, it is quite inevitable to see data presented in tables. With column header, and […] The post Massaging Data using Pandas appeared first on Machine Learning Mastery.  ( 25 min )
  • Open

    MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets
    In deep learning and machine learning, having a large enough dataset is key to training a system and getting it to produce results. So what does a ML researcher do when there just isn’t enough publicly accessible data? Enter the MLCommons Association, a global engineering consortium with the aim of making ML better for everyone. Read article > The post MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock
    Episode 135 | April 13, 2022 In “Just Tech: Centering Community-Driven Innovation at the Margins,” Senior Principal Researcher Mary L. Gray explores how technology and community intertwine and the role technology can play in supporting community-driven innovation and community-based organizations. Dr. Gray and her team are working to bring computer science, engineering, social science, and […] The post Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock appeared first on Microsoft Research.  ( 31 min )

  • Open

    How IoT Uses Machine Learning To Change The World
    IoT and Machine Learning are the most advanced and evolving technologies that continue to rise in today’s modern world, simplifying human efforts and making lives easier. These technologies have proved to streamline operations and workflows for various industries and provide more robust and scalable applications that allow users to make things done seamlessly.  In recent… Read More »How IoT Uses Machine Learning To Change The World The post How IoT Uses Machine Learning To Change The World appeared first on Data Science Central.  ( 6 min )
    Drag-and-drop Data Pipelining: The Next Disruptor in ML
    Recent advances in machine learning (ML) and artificial intelligence (AI) technologies are helping enterprises across industries quickly move from their use cases from the pilot stage to production and operationalization. According to a report by McKinsey & Company, by 2030, businesses that fully absorb AI could double their cash flow, while companies that don’t could… Read More »Drag-and-drop Data Pipelining: The Next Disruptor in ML The post Drag-and-drop Data Pipelining: The Next Disruptor in ML appeared first on Data Science Central.  ( 3 min )
    Advances Highlight the Future of IoT Security
    As the Internet of Things (IoT) is gradually moving from being a centralized structure to a more complex network of innumerable decentralized smart devices, the need for security of data will be acknowledged to a greater degree, thereby promoting the expansion of the global IoT security market. The larger the volume of the data transferred… Read More »Advances Highlight the Future of IoT Security The post Advances Highlight the Future of IoT Security appeared first on Data Science Central.  ( 3 min )
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    The Moment A Neural Net Became Sentient For The First Time - AI Art Story [4K] #shorts
    submitted by /u/fooo-ooo [link] [comments]
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    Should I use A Encoder Decoder CNN
    I'm trying to make a model to play a car racing simulator. I have a dataset with the inputs used(human) to get fast lap times. I would like to make a model that reads the game video output and predicts the arrow key inputs to get a fast lap time. It seems, to me, that a CNN with encoder-decoder layers trained on the keyboard inputs would work. Is this a good architecture? I'm also having a hard time finding useful literature. please let me know if there is anything I should look into or do differently. submitted by /u/newroadkill [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science, AI and Machine Learning in 2022
    submitted by /u/saik2363 [link] [comments]  ( 1 min )
    Last Week in AI: OpenAI DALL-E 2 generates amazing images, Google's 540 billion parameters language model, Clearview AI branches out beyond police, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science in 2022
    submitted by /u/saik2363 [link] [comments]
    Conversation about the future, life and AGI
    submitted by /u/HumanSeeing [link] [comments]  ( 1 min )
    AI predicts if and when someone will experience cardiac arrest
    submitted by /u/qptbook [link] [comments]
    The last Woolly Mammoth on Earth
    submitted by /u/Ok-Passion-6574 [link] [comments]
    The last Woolly Mammoth on Earth
    Is it good or bad? Also, I was wondering what art goes big as an NFT? submitted by /u/Ok-Passion-6574 [link] [comments]
    Stanford Researchers Introduced a Novel Deep Learning Computer-Assisted System for Real-Time Open Surgery and AVOS (the Annotation Videos of Open Surgery) Dataset
    In recent years, the rise of Deep Learning has continuously brought innovations to many fields, and the medical domain is one of them. AI applications in this field are countless: from pre-operative diagnosis to disease classification, from skill assessment to post-operative rehabilitation. Among them, systems to assess surgical skills and provide feedback to improve technique could help in decreasing the number of complications in surgical procedures, which are still the third leading cause of death globally. AI can be an additional coach for surgical trainees and an expert colleague for experienced surgeons. But, to train an AI system, reliable data are fundamental. The more utilized type of data in this context is undoubtedly video streams, as a camera is less invasive than other types of sensors, such as ArmBand or EEG, which could weigh on the surgeon’s performance given their physical bulk. This applies particularly to laparoscopic surgery, where an in-body fiber-optic camera is used to visualize the operating area and facilitate rapid data collection. For this reason, the majority of computer-assisted systems focus on laparoscopic surgery. Continue Reading Paper: https://arxiv.org/pdf/2112.07219.pdf https://preview.redd.it/84mwdw81l4t81.png?width=741&format=png&auto=webp&s=636aff067560876d14f37caccb83bd951e991c68 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Machine Learning vs. Cookie Consent Systems
    submitted by /u/DaveBowman1975 [link] [comments]
    Artificial Nightmares: Stone Golem Ruins || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    My epiphany on synthetic media five years later, and what I feel is coming within the next five years
    Roughly five years ago, I created this thread where I outlined my realization about the imminency of synthetic media. This was before transformers blew up, before StyleGAN, before GPT-2, when WaveNet and DeepDream were still among the best we could do, and when predictive text algorithms that were barely better than Markov Chains were still the state of the art. In five short years, the state of artificial intelligence has changed overwhelmingly, to the point it's barely recognizable. Looking back to 2017, I now get this sense of everything feeling so primitive and fake. I've stated many times that AI before roughly 2019 was a bunch of digital magic tricks, and the field as a whole was essentially a giant Potemkin village that utilized clever sleight of hand and advertising to make it se…  ( 7 min )
  • Open

    [N] Substantial plagiarism in BAAI’s “a Road Map for Big Models”
    BAAI recently released a two hundred page position paper about large transformer models which contains sections that are plagiarized by over a dozen other papers. In a massive fit of irony, this was found by Nicholas Carlini, a research who (among other things) is famous for studying how language models copy outputs from their training data. Read the blog post here submitted by /u/StellaAthena [link] [comments]  ( 2 min )
    [D] What's the status of live speech-to-speech conversions? (not TTS)
    I've been trying to find information about the subject, but almost every result is TTS, and the only example of what I actually want (Respeecher) costs 2 grand a year. Are there any (preferably open-source) other alternatives? submitted by /u/UncertainOutcome [link] [comments]  ( 1 min )
    Comparison of workshops at major conferences. [D]
    I understand that workshop quality is dependent more on the workshop itself rather than the host conference. But, in general, how are workshops from CVPR, NeurIPS, ICLR, ICML, etc. viewed by the community in relation to one another? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [Project] Leniax - A Lenia simulation library powered by JAX
    Hi everyone! I'm really happy to finally publish the work I've been doing on the Cellular Automata called Lenia. It is a JAX library called Leniax and allows one to simulate thousands of simulations in parallel using CPU, GPU, or TPU. With it, you can: Simulate Conway's Game of Life Simulate multiple Lenia simulations in parallel Use gradient descent to search for Continuous CA parameters Launch a QD search to discover a ton of diversity in Lenia. Check out the blog post for some visual results The main goal of this work was to advance the state of automatic discovery for those systems. 10 months ago, I bet on QD to do so, turns out it indeed works! QD algorithms really rock! The code is completely open-source with all the examples, notebooks, and even experiments I ran. (See the doc for more links) I would love to have feedback on this and of course, if you find that subject interesting, engage with our community! Cheers! submitted by /u/morgangiraud [link] [comments]  ( 2 min )
    [D] Effective Image Pre-Processing Techniques for Enhancing Defects in an Image?
    So I am doing some object detection on Pavement Defects. I've already collected the data with some annotations but the model is performing rather poorly. For example, the `maP` is about `0.12` for the whole data. By examining the data, I think one of the reasons is that some of the defects such as cracks, or faded pavement markings are not so clear and either casted by a shadow or too bright from the sun. Image Example #1 Or from motion blur Image Example #2 Is there any image preprocessing technique aside maybe from CLAHE that could be applied? Moreover, I am currently using YOLOv5 for this. submitted by /u/sarmientoj24 [link] [comments]  ( 3 min )
    [P] the copent package v0.2.3 available on PyPI now
    The copent package implements the method for estimating copula entropy (mutual information) and transfer entropy (conditional mutual information / conditional independence). This version add a new feature (an argument 'mode') for dealing with large data when memory is limited. Github: https://github.com/majianthu/pycopent PyPI: https://pypi.org/project/copent/ any comments are welcome. submitted by /u/majianthu [link] [comments]  ( 1 min )
    [D] Feedback on a worked Continuous Deployment Example (CI/CD/CT)
    Hey everyone! At ZenML, we released today an integration that allows users to train and deploy models from pipelines in a simple way. I wanted to ask the community here whether the example we showcased makes sense in a real-world setting: Context ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. Seldon Core is a production grade open source model serving platform. It packs a wide range of features built around deploying models to REST/GRPC microservices that include monitoring and logging, model explainers, outlier detectors and various continuous deployment strategies such…  ( 2 min )
    [D] Can we decrease the training time of a deep learning model by using a domain specific pretrained backbone instead of the standard imagenet?
    I am working in the retail domain atm, and train a lot of image classifiers. I have always used imagenet as pretrained to train my model upon. I thought it would be straightforward to train a backbone on a big retail dataset(1000+ classes), and then use that as pretrained and it'll reduce the time it takes for my models to generalize. Turns out, the model took more epochs to train when using the retail backbone, then the imagenet one. Isn't this counter-intuitive? What else can I do to make by backbone better? submitted by /u/lMAObigZEDONG [link] [comments]  ( 1 min )
    [D] Removing Unpredictable Samples from a Training Set
    Hi, I have a fairly interesting project that I am working on. I have a model that has some samples which are completely unpredictable, random noise, and some that are reliably predictable. How would you go about separating out the samples which can be predicted, identifying them going forward, and retraining on a cleaned dataset with only those samples? Interested to see someone else's approach to this. Edit: I forgot to mention that my data is from an embedding matrix from ordinal categorical features. submitted by /u/Katapilla_Killa [link] [comments]  ( 2 min )
    [D] What's your experience with Model-Agnostic Meta-Learning in RL?
    There is the original paper and there was a subsequent paper by other authors titled: On the Convergence Theory of Debiased Model-Agnostic Meta_Reinforcement Learning I've been working on implementing the latter paper on the HalfCheetah environment. However, my attempts have been unsuccessful so far (I know the authors provided the code, but I am trying to write my own code to check my understanding). I'd like to know any tips/tricks that anyone can share and just to know about people's experiences, especially using MAML for RL. submitted by /u/carlml [link] [comments]  ( 1 min )
    How to deal with the fact that whatever idea I have has already been published.
    I'm always having these ideas for projects and papers, then I look around a bit, and I find someone that has already studied that idea and published it. It's genuinely annoying, It's been 6 months now, and all the papers are newly published (2021 mostly) so It's even more annoying. How do you deal with that ? and How do you find a niche that no one is touching. I just started a PhD, so It's really stressing me out. I feel like I'll never be able to advance on my thesis, and that I should just quit, because better work has already been done. submitted by /u/AlanRoofies [link] [comments]  ( 4 min )
    [P] Faster version of cv2.BFMatcher(cv2.NORM_L2) optimized for keypoints matching
    Hi there, in the case if any of you use the openCV BFMatcher with NORM_L2, you can try to use my recent pet project: https://github.com/kmkolasinski/fast-bfmatcher Basically the speed-up is achieved by using faster replacement for BLAS, a BLIS library and some custom implementations written in C and cython. submitted by /u/kmkolasinski [link] [comments]  ( 1 min )
    [D] Are there any comparison studies on learning rate schedules for generative transformers?
    My current research heavily involves generative vision transformers and after some experimentation it seems like the choice of a LR scheduler is a crucial factor for proper convergence. Does anyone know of any comparison studies done recently that explore various types of schedulers for generative tasks? submitted by /u/Megixist [link] [comments]  ( 4 min )
    [N] Fine-Tuning LayoutLM v2 For Invoice Recognition
    With the advent of deep learning models, automated data extraction is becoming more accessible. In this article, we demonstrate step-by-step how to fine-tune layoutLM V2 on invoices starting from data annotation to model training and inference. Enjoy the read and if you have any questions, leave them below. submitted by /u/UBIAI [link] [comments]  ( 1 min )
  • Open

    Lidar-Camera Deep Fusion for Multi-Modal 3D Detection
    Posted by Yingwei Li, Student Researcher, Google Cloud and Adams Wei Yu, Research Scientist, Google Research, Brain Team LiDAR and visual cameras are two types of complementary sensors used for 3D object detection in autonomous vehicles and robots. LiDAR, which is a remote sensing technique that uses light in the form of a pulsed laser to measure ranges, provides low-resolution shape and depth information, while cameras provide high-resolution shape and texture information. While the features captured by LiDAR and cameras should be merged together to provide optimal 3D object detection, it turns out that most state-of-the-art 3D object detectors use LiDAR as the only input. The main reason is that to develop robust 3D object detection models, most methods need to augment and transform th…  ( 8 min )
  • Open

    Very Deep Neural Networks Explained in 40 Seconds
    By Vincent Granville, Ph.D., Author at MLtechniques.com Sponsored Post Very deep neural networks (VDNN) illustrated with data animation: a 40 second […] The post Very Deep Neural Networks Explained in 40 Seconds appeared first on Machine Learning Mastery.  ( 4 min )
    Scientific Functions in NumPy and SciPy
    Python is a general-purpose computation language, but it is very welcomed in scientific computing. It can replace R and Matlab […] The post Scientific Functions in NumPy and SciPy appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    What can you tell us about him?
    (A Sci-Fi Ultrashort)  ( 5 min )
  • Open

    MIT’s FutureMakers programs help kids get their minds around — and hands on — AI
    The programs are designed to foster an understanding of how artificial intelligence technologies work, including their social implications.  ( 8 min )
  • Open

    Best GridWorld environment?
    In your opinion, what is the best gridworld environment? I want to compare different RL algorithms on it. I’m looking for something super basic: - start and goal state - some obstacles - customisable: move the start and goal state, place obstacles in different points, modify reward map etc. - computationally efficient Thank you submitted by /u/wiston_smith [link] [comments]  ( 1 min )
    Custom Callback for Max Episode Reward using Stable Baselines3 with Custom Env
    Hi all, I've built a custom gym env and am using Stable Baselines3 to train an agent. I would like to visualise in TensorBoard the maximum reward achieved for each episode. I have these values in a list in my env, and I am trying to create a custom Callback to plot this in TensorBoard but it's not working. I've looked over the documentation and other forums but can't figure out how to do this. Can anyone help me out? 🙏🏽 Thank you! submitted by /u/leozinho2r [link] [comments]  ( 1 min )
    Open-sourced NetHack 2021 NeurIPS Challenge winning agent
    Recently, we have released the source code of our winning solution for the NetHack 2021 NeurIPS Challenge: https://github.com/maciej-sypetkowski/autoascend We hope that it will help in leveraging this complex environment, that still seems to be beyond capabilities of reinforcement learning. Check out links in the README "Description" section for more context. submitted by /u/procedural_only [link] [comments]  ( 1 min )
    Project
    Do any of u have any good rl project suggestion or a complete project for college major , I have only done work on some self playing Atari , mario games , if u have any good idea please suggest 🙌 submitted by /u/stoned_egineer [link] [comments]  ( 1 min )
    Training a DQN agent for platformer game
    Does anyone have experience training agents to play platformer games like mario? I am trying to train an agent for the platformer game Jump King to get past atleast a few levels, using DQN but the agent is performing poorly after 8000 episodes of training, (one episode being the agent spawns at the start and has 15 seconds or so to jump around gaining reward) he is barely able to get past the first level most of the time :c I am using a very basic sequential network of 2 Linear layers with inputDim 4, outputDim 4, and hiddenDim 32. and because my state is not using any image data, its just (current_level, x_pos, y_pos, jumpCount) as input to the network . As for the reward, I am using the y position to give reward if the agent is getting to a new level (large reward) or making progress in the current level (curr_y > old_y), otherwise the agent gets a negative reward. Should I consider using a CNN and image data to train this agent like in the atari games paper, or is using image data and a conv net going to perform worse rather than using my current state? Should I consider combining image data with the current state, or just keeping the current non-image data state but ? Also, roughly how long should I be training the agent for? is 8000 episodes not enough? 1 episode takes roughly 7 seconds in time (it is using pygame engine and I turned off the rendering and I think that made it a little bit faster) This is my first time training an agent for a hard game like this using DQN so I would appreciate any tips/advice to improve the agent! repo: https://github.com/senweim/JumpKingAtHome submitted by /u/TernaryJimbo [link] [comments]  ( 2 min )
    Which environment impress you? (related with software architecture, API, ...)
    I want to hear about your impressive environment! Specifically, I want to make my custom environment well using various library like openAI gym. In this contxet, I find out the highway-env https://github.com/eleurent/highway-env/ ! I think this environment has convinient API for users. Thus, I make my custom env with referencing the highway-env https://preview.redd.it/ndl1h1dvpzs81.png?width=711&format=png&auto=webp&s=267320a9713d98e7e49c4bb89423e5a9612bad8e In this line, could you speak your best environment? It doesn't matter about your best env has any advantage! submitted by /u/Seungeon94 [link] [comments]  ( 1 min )

  • Open

    Strategies to deal with Large Action Spaces
    Hey guys, I tried building a PPO model for Wordle. My initial test was checking the performance of the model with just 100 words. The agent was able to learn within a few thousand epochs and had an average guess length of about 2.8 before it could correctly identify the words. However, when i extend the action space to the entire 2.3k words, the model barely learns. Even after a few 100k iterations, the mean length revolves around 5.9 (given wordle has a max of 6 attempts per game) Any suggestions on how to help the agent learn faster in large action spaces? ​ I also tried an embedding based approach, but the performance was very similar. ​ Thanks submitted by /u/altair9335 [link] [comments]  ( 1 min )
    What are your best results for ProcGen: CoinRun?
    Has anyone managed to get a consistent score of > 9 on CoinRun? I understand that some generated levels require LSTMs in order to be solvable 100% of the time, but even excluding these hard-core levels I can see some occasions where my agents are not operating with 100% effectiveness. For some reason "fully solving" CoinRun seems harder than expected. The papers on CoinRun usually just show the results after 100mm steps or so, but I am more interested in what the community has achieved with "normal setups". submitted by /u/tmuxed [link] [comments]  ( 1 min )
    How to use the same action in trained RL network, when model is retested?
    I trained RL agent using stable baseline library and gym env. When I am trying to test agent, this makes different action when I am re running again. I used the same seed in test env. for i in range(length-lags-1): action, _states = model.predict(obs_test) obs_test, rewards, dones, info = env_test When I am runnig again the above code, I am getting the different results. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    Unity RL ml agents module, walker example
    Hi all, I'm trying to teach my custom fbx model to walk with the help of ppo, as in the example from ml agents. I have difficulties with the exact import and the assignment of rigidbody here, that is, the neural network is being trained, but for some reason physics does not work. Has anyone seen it, or does anyone have an example of how to train a unity custom fbx model using ml agents? Thx all! submitted by /u/IndependenceCivil576 [link] [comments]  ( 1 min )
    Implementing RL algorithm on apache spark
    I want to run RL algorithm on Apache Spark. However, RL does not exists in Spark's MLib. Is it possible to implement it? any links may help. Thank you in advance submitted by /u/fatenLouati [link] [comments]  ( 1 min )
    Is reinforcement learning being used for the development of self-driving cars
    We will introduce the general process of self-driving tasks first and then the development of Reinforcement Learning in self-driving cars. The general process of self-driving tasks includes perceiving, decision-making, planning and controlling. The tasks of perceiving have adopted deep learning and that did a good job. Being different from monitoring learning, decision intelligence AI methods, which are represented by reinforcement learning, model the environment as Markov Decision Process(MDP)to get optimization. In sequential decision problems the utility of agent's actions do not depend on single decisions, expressed with the state, which the agent would have gotten into, as the result of this decision, but rather on the whole sequence of agent's action. Here, one thing that needs t…  ( 4 min )
  • Open

    [R] Transformers replicate Hippocampal representations; notably place and grid cells in the brain
    Paper: https://arxiv.org/abs/2112.04035 Yes, the paper is cautious about comparing the model one-to-one to the brain “Note, we are not saying the brain is closely related to transformers because it learns the same neural representations, instead we are saying the relationship is close because we have shown a mathematical relationship between transformers and carefully formulated neuroscience models of the hippocampal formation.” While objections like "its just correlation/relation, its not exactly the same!!" are true to an extend, its still a very unexpected observation that, they're even remotely similar. Needless to say, Transformers were not inspired from the brain - and as more evidence collates (https://www.nature.com/articles/s42003-022-03036-1 --> Activations are linearly correlatable) it does feel mysterious; perhaps atleast some of the systems used by the brain converge on an efficient pattern discovered by our backpropogated friends... [insert 'coincidence? I think not!' meme] submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [D] What is the smallest, most capable, generative language model available now?
    I'm looking for a generative-LM equivalent of an EfficientNet-Lite, for inference on devices with limited to no VRAM. I know about some popular ones like DistilGPT2. But it's been 2 years after its release. Surely, someone improved their size/performance ratio, right... right? Thank you for your time. 🤗 submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    How would you rank major tech companies' research labs for prestige? [D]
    This is just for fun, not to be taken too seriously. But I'm curious what are the reputations among the community for various research divisions (specifically AIML) of major companies, ie: Google, Facebook/Meta, Microsoft, Amazon, NVIDIA, IBM, etc. My perceived (albeit naive) view is Google > Facebook > MSR are top tier. Don't know much about the others. But I've read that some people consider MSR most prestigious due to their academic environment. But I've seen that Google and FB dominate in terms of major publications, ie: vision transformers are associated with Google. submitted by /u/avd4292 [link] [comments]  ( 2 min )
    [P] Squirrel: A new OS library for fast & flexible large-scale data loading
    Hi all, Today we open-sourced Squirrel, a data infrastructure library that my colleagues and I have been working on over the past 1.5 years: https://github.com/merantix-momentum/squirrel-core We’re a team of ~30 ML engineers developing machine learning solutions for industry and research. Across all our projects, we need to load large-scale data in a fast and cost-efficient way, while keeping the flexibility to work with any possible dataset, loaded from local storage, remote data buckets or via APIs such as HuggingFace. Not finding what we were looking for, we decided to build it ourselves. Squirrel has already proven its value in our deep learning projects at Merantix Momentum and shows competitive benchmark results (check them out here). We’re super excited to share it with the OSS community and hope that you can benefit from it as well! Looking forward to hearing your feedback and questions :) submitted by /u/Nextpenade [link] [comments]  ( 1 min )
    [P] Renting lots of GPUs (100-200) in single environment?
    I want to apply an already trained ML model on a huge textual data set. I have funds to rent Cloud GPUs, but have not much experience using them. Preferably, I do the setting up of the environment only once (downloading of the data, model, software packages, etc) only once and then simply send ~100-200 scripts each to their own GPU for processing. Then at the end everything is in the same location and I can easily send back the final result file (~100-200 output files concatenated together) back to my PC. Any advice on how to do that? All GPU renting servers only have 1-8 GPUs per server and do not (seem) to allow for sharing of the environment, which seems very inefficient to me. All comments are appreciated. submitted by /u/Intelligent-End2673 [link] [comments]  ( 2 min )
    [R][P] Algorithmic stability of minibatch SGD
    Hi, was wondering if anyone else has looked into "A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent" and in particular eqn. 2, which is the derivation of Section 3.5 of "Train faster, generalize better: Stability of stochastic gradient descent" adapted for the case where the underlying loss we are interested in guaranteeing generalisation for is upper bounded by M (rather than 1 as assumed by Hardt et al). In the case of minibatch SGD, the number of datapoints n becomes the number of minibatches, as ideally one would like to reduce the number of steps T by maximizing the learning rate, which requires maximizing the minibatch size for the loss to actually converge to 0 on the training data. However, what I am unsure about is, specifically for the classification task where we typically minimize the cross-entropy objective, whether the cross-entropy objective is an upper bound on any kind of M-bounded loss function. In the ideal world, I would like to show that it upper bounds the 0-1 loss which means the cross-entropy over the dataset is an upper bound on the classification accuracy and any generalization statement automatically becomes a statement about the very practical metric of accuracy. Such a statement about cross-entropy upper-bounding 0-1 is made in Section 3C of "Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization". However, one can provide a counterexample in the limit of the softmax "temperature" parameter where the predicted class distribution becomes uniform, in the case of 2 classes, for the typical case of log being the natural logarithm (it is no longer a counter-example if log base 2 is used). I haven't been able to show or find proof that this statement "xent >= 0-1" is true (for some logarithm base and some number of classes) and was hoping that someone might have. submitted by /u/wakeupandshave [link] [comments]  ( 1 min )
    [R] Channel Augmented Joint Learning for Visible-Infrared Recognition
    Since going open source in March 2020, MindSpore gone from strength to strength. The deep learning framework has been downloaded by over 1.2 million users; algorithms running on MindSpore have been published in AI journals or presented at conferences; and countless developments have been released in device-edge-cloud scenarios to transform business fields, such as intelligent manufacturing, cloud, wireless, data communication, energy, and consumer business. Built on extensive experience from the scientific, academic, and industrial sectors, MindSpore-based AI papers accounted for 11% of all AI papers in October 2021, ranking No.2 worldwide by month, and No.3 worldwide in Q4 2021. In this blog post, based on a paper published in ICCV 2021 by Professor Mang Ye of Wuhan University, we intro…  ( 3 min )
    [R] Looking for Ideas in Pre-training a RL Agent
    Hi all, I've been working on reinforcement learning lately, but wanted to come to the general ML subreddit to seek inspiration from other disciplines. I've been working on strategies to decrease the training time for my real-world inverted pendulum experiment. Specifically, I am trying to pre-train the Q network in a simulation before deploying. The strategy that I have found most successful right now is this: start with randomly generated weights REPEAT OF AN EPOCH: - Load new_weights to Q Network - initialize an environment with randomly generated parameters (i.e. random mass, lengths, etc). - Train agent on environment for 100 episodes - Save new_weights I have tried a variety of strategies to add a little bit more control over this process. I've tried a soft update that never showed improvement. W = old_weights * (1 - alpha) + new_weights * alpha I have tried an additive update which was slightly successful. Measured the success of each network as the sum of rewards over the epoch. A = (old_R)/(old_R+new_R) ; B = (new_R)/(old_R+new_R) W = old_weights * A + new_weights * B But none of these work as well as just using the most recent weights. I've included some results if anyone's interested. The first graph is three test trials with random initial weights, the second graph is with pre-trained weights. This is a pretty hand-wavy way of doing this, does anyone have any suggestions to do this better? ​ https://preview.redd.it/9mnewmibbts81.png?width=375&format=png&auto=webp&s=a6ff866a31987375d276d12f69dbe2af40380bf4 https://preview.redd.it/096n7ctcbts81.png?width=375&format=png&auto=webp&s=b19d800fbb416fca288e86640d9458c8993e0759 ​ submitted by /u/nickthorpie [link] [comments]  ( 2 min )
    [P] Recommendations for high frequency multivariate time series data
    Hey there! I'm looking for advice on datasets to use for a project. We are looking for the following traits: ​ 1) Multivariate (at least 3 or 4, and probably no more than 50 or 100 as an upper bound). 2) High frequency (Ideally at least once every 5-10 minutes) 3) We need to have some notion of an underlying 'state' of the data for certain windows. E.g. in an energy setting, period X was the 'family at home using appliances' state. Or in the healthcare setting, period X is 'the patient is in a stable state' while period Y is something like 'the patient experiences a cardiac event' ​ ​ Nice to have: 4) It'd be great if some features had some level of seasonality while others didn't. ​ ​ Do folks have any recommendations for datasets that meet some (or hopefully all) of the criteria? I did some light pursuing on UCI, but it seems like much of it is not high frequency enough, and/or doesn't have some notion of underlying states. submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [D] What to do when the authors don't release source code?
    Hello, I am currently working on a research paper that I aim to publish at a reputable conference shortly. In our work, we borrow a feature engineering technique from one of the papers that have not been previously applied to the domain (time series AD) before that paper. However, the authors haven't released the source code of their implementation of the model (but the feature engineering technique is publicly available). I feel like that is an important baseline and just failing to include it would get my paper rejected. I have contacted all the authors for the source code, but none of them responded. The architecture they use is a fairly complicated one and would be very difficult to implement on my own. How do I go about this situation? My advisor told me I can just include a few points on the footnote on why we don't include this as a baseline. Those being: No open-source implementation Contacted the authors, didn't receive a response The paper has not been published, only uploaded to arxiv. Any help is appreciated! submitted by /u/mythrowaway0852 [link] [comments]  ( 5 min )
  • Open

    AI when given the prompt of “Amy Schumer” on wombo.art
    submitted by /u/9YearOldGeneralOfPew [link] [comments]  ( 1 min )
    AI News: New Robot Fingertips Can Feel | AI Tracking Satellite | SingularityDAO DynaSets | Tesla Optimus Specs
    submitted by /u/getrich_or_diemining [link] [comments]
    a quick high-level overview of diffusion models (like dall-e 2)
    submitted by /u/individual_kex [link] [comments]
    How can I train an AI to write articles based on my own work?
    Hi all! As a sort of art experiment, I want to train an AI to write tech news articles based on my own work. I worked as a freelance writer for several years and have thousands of articles (each as a Word doc) on tech news. I want to use those articles to train the AI, then have it generate new articles to post to a blog. I have a pretty good understanding of machine learning, but have never trained a model myself. I'm hoping you all can provide some direction. Some specific questions: Can you recommend a model? For each training article, can I provide a "source" (like another news article) so the AI understands where the content in the training article came from? * For each generated article, can I provide a news article source for it to base its content on? ** Can I use the Word docs as the training set, or do I need to convert them into something else for training? *as an example: If I wrote an article on the release of a new Raspberry Pi board, my source might be the press release on the Raspberry Pi website. **as an example: If I want it to generate an article about a new drone delivery service, my input source might be a news article on Reuters or something. submitted by /u/TheSerialHobbyist [link] [comments]  ( 1 min )
    are there any open source video ads generation model out there?
    Hey is there any models to generate videos for advertisment either as text-to-video images-to-video or video-variation creation, if not would video variation generative models would be a good fit for create ads ?? submitted by /u/National-Departure78 [link] [comments]
    DALL-E 2, the future of AI research, and OpenAI’s business model
    submitted by /u/bendee983 [link] [comments]
    how do i learn artificial intelligence from the basics?
    Is there any resources which has example driven explanations, from scratch or basics? I have seen some websites just jumping into "use this module/library" Without explaining what it does or how it works, just some basic examples so that i can build on top or experiment by my own. submitted by /u/-1Mbps [link] [comments]  ( 1 min )
    Using the NEAT algorithm to teach elves to deliver presents
    submitted by /u/zuparnowa [link] [comments]
    Trippy AI Dream 16 - Gothic Style Jungle Fever - VQGAN CliP Rife-Rea...
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 23 - Flower Power² VQGAN CliP Rife-RealESRGAN
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 32 - WE REACHED 100 SUBSCRIBERS !!
    submitted by /u/LordPewPew777 [link] [comments]
    MindSpore has implemented a visible-infrared recognition algorithm
    submitted by /u/Creative_Habit_6868 [link] [comments]
    I want to learn AI From beginning ? from where can i start?
    submitted by /u/Late_Illustrator_545 [link] [comments]  ( 1 min )
    Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance in Object Detection
    Object detection is a crucial problem in computer vision, and YOLO (You Only Look Once) one-stage object detectors have set the bar for performance since the release of YOLOv1 in 2015. The YOLO series has undergone considerable network and structural improvements over the years. The most recent version, YOLOX, has attained an optimal balance of speed and accuracy on the NVIDIA Tesla V100 Tensor Core GPU. Baidu researchers have improved their earlier PP-YOLOv2 model, resulting in PP-YOLOE, a cutting-edge industrial object detector that beats YOLOv5 and YOLOX in speed and accuracy trade-off. The team’s PP-YOLOE-l variant outperforms PP-YOLOv2 by 1.9 percent AP and YOLOX-l by 1.3 percent AP on COCO datasets. The PP-YOLOv2 baseline model architecture comprises a ResNet50-vd backbone with deformable convolution, a PAN neck with an SPP layer and DropBlock, and a lightweight IoU aware head. PP-YOLOv2 assigns only one anchor box to each ground truth object, similar to YOLOv3. It is strongly reliant on hand-crafted design, which may not generalize well enough when trained on other datasets. Conversely, this technique necessitates a lot of additional hyperparameters. To overcome this problem, Baidu researchers have added an anchor-free technique to PP-YOLOv2 that tiles one anchor point on each pixel and assigns upper and lower bounds for detecting heads to assign ground facts to a matching feature map. The center of a bounding box can then be determined to choose positive samples from the closest pixels. A 4D vector is also predicted for regression, with minor model speedups and precision losses due to the changes. Continue Reading Paper: https://arxiv.org/pdf/2203.16250.pdf Github: https://github.com/PaddlePaddle/PaddleDetection submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Song writing Ai
    Hi all, I’m hoping someone could point me in The direction of an AI that I could dump all my previous song writing into that would spit out something "inspired by’ it. Mostly a bit of fun but interested in seeing what it throws back out at me. thanks in advance for any hot tips. submitted by /u/doccaballero [link] [comments]  ( 1 min )
  • Open

    How to train the NN model with a custom dataset?
    Hi all. I have been trying to work on an object detection project. Basically trying to play around with the codes in the documentation for a custom dataset. I am using Yolov3 and I trained my model using darknet and seems like the model is learning it wrong because of which the weights are not correct either. I don't know how to check that but when doing forward propagation, the array seems to be ok, without nan values, but the confidence is mostly 0's and some 0.25's. Anyone who can guide me, on where I could have gone wrong? ​ code: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html [yolov3 secion] output: outputs = [[0.03846154 0.03846154 0.27884614 0.21634616 0.5 0.25 ] [0.03846154 0.03846154 0. 0.47596154 0. 0. ] [0.03846154 0.03846154 0.89663464 0.78365386 0.5 0.25 ] ... [0.99038464 0.99038464 0.02403846 0.03125 0.5 0.25 ] [0.99038464 0.99038464 0.03846154 0.07211538 0.5 0. ] [0.99038464 0.99038464 0.07932692 0.05528846 0.5 0. ]] confidence = 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.25 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 ... submitted by /u/ersa17 [link] [comments]  ( 1 min )
  • Open

    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    CONet: Channel Optimization for Convolutional Neural Networks. (arXiv:2108.06822v2 [cs.CV] UPDATED)
    Neural Architecture Search (NAS) has shifted network design from using human intuition to leveraging search algorithms guided by evaluation metrics. We study channel size optimization in convolutional neural networks (CNN) and identify the role it plays in model accuracy and complexity. Current channel size selection methods are generally limited by discrete sample spaces while suffering from manual iteration and simple heuristics. To solve this, we introduce an efficient dynamic scaling algorithm -- CONet -- that automatically optimizes channel sizes across network layers for a given CNN. Two metrics -- "\textit{Rank}" and "\textit{Rank Average Slope}" -- are introduced to identify the information accumulated in training. The algorithm dynamically scales channel sizes up or down over a fixed searching phase. We conduct experiments on CIFAR10/100 and ImageNet datasets and show that CONet can find efficient and accurate architectures searched in ResNet, DARTS, and DARTS+ spaces that outperform their baseline models. This document supersedes previously published paper in ICCV2021-NeurArch workshop. An additional section is included on manual scaling of channel size in CNNs to numerically validate of the metrics used in searching optimum channel configurations in CNNs.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Measuring AI Systems Beyond Accuracy. (arXiv:2204.04211v1 [cs.SE])
    Current test and evaluation (T&E) methods for assessing machine learning (ML) system performance often rely on incomplete metrics. Testing is additionally often siloed from the other phases of the ML system lifecycle. Research investigating cross-domain approaches to ML T&E is needed to drive the state of the art forward and to build an Artificial Intelligence (AI) engineering discipline. This paper advocates for a robust, integrated approach to testing by outlining six key questions for guiding a holistic T&E strategy.  ( 2 min )
    Learning-Based Vulnerability Analysis of Cyber-Physical Systems. (arXiv:2103.06271v3 [cs.CR] UPDATED)
    This work focuses on the use of deep learning for vulnerability analysis of cyber-physical systems (CPS). Specifically, we consider a control architecture widely used in CPS (e.g., robotics), where the low-level control is based on e.g., the extended Kalman filter (EKF) and an anomaly detector. To facilitate analyzing the impact potential sensing attacks could have, our objective is to develop learning-enabled attack generators capable of designing stealthy attacks that maximally degrade system operation. We show how such problem can be cast within a learning-based grey-box framework where parts of the runtime information are known to the attacker, and introduce two models based on feed-forward neural networks (FNN); both models are trained offline, using a cost function that combines the attack effects on the estimation error and the residual signal used for anomaly detection, so that the trained models are capable of recursively generating such effective sensor attacks in real-time. The effectiveness of the proposed methods is illustrated on several case studies.  ( 2 min )
    A Low-Cost Robot Science Kit for Education with Symbolic Regression for Hypothesis Discovery and Validation. (arXiv:2204.04187v1 [cond-mat.mtrl-sci])
    The next generation of physical science involves robot scientists - autonomous physical science systems capable of experimental design, execution, and analysis in a closed loop. Such systems have shown real-world success for scientific exploration and discovery, including the first discovery of a best-in-class material. To build and use these systems, the next generation workforce requires expertise in diverse areas including ML, control systems, measurement science, materials synthesis, decision theory, among others. However, education is lagging. Educators need a low-cost, easy-to-use platform to teach the required skills. Industry can also use such a platform for developing and evaluating autonomous physical science methodologies. We present the next generation in science education, a kit for building a low-cost autonomous scientist. The kit was used during two courses at the University of Maryland to teach undergraduate and graduate students autonomous physical science. We discuss its use in the course and its greater capability to teach the dual tasks of autonomous model exploration, optimization, and determination, with an example of autonomous experimental "discovery" of the Henderson-Hasselbalch equation.  ( 2 min )
    Neural graph embeddings via matrix factorization for link prediction: smoothing or truncating negatives?. (arXiv:2011.09907v2 [cs.SI] UPDATED)
    Learning good quality neural graph embeddings has long been achieved by minimzing the pointwise mutual information (PMI) for co-occuring nodes in simulated random walks. This design choice has been mostly popularized by the direct application of the highly-successful word embedding algorithm word2vec to predicting the formation of new links in social, co-citation, and biological networks. However, such a skeumorphic design of graph embedding methods entails a truncation of information coming from pairs of nodes with low PMI. To circumvent this issue, we propose an improved approach to learning low-rank factorization embeddings that incorporate information from such unlikely pairs of nodes and show that it can improve the link prediction performance of baseline methods from 1.2% to 24.2%. Based on our results and observations we outline further steps that could improve the design of next graph embedding algorithms that are based on matrix factorizaion.  ( 2 min )
    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. (arXiv:2204.04170v1 [eess.AS])
    Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to more robust embeddings. Now, selecting the most relevant augmentations has proven crucial for better downstream performances. Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training. This is performed with respect to a downstream task of interest, hence saving a costly hyper-parameter search. Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations. We furthermore conduct a qualitative analysis of the automatically selected augmentations and their variation according to the considered final downstream dataset.  ( 2 min )
    Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement. (arXiv:2102.05185v4 [cs.LG] UPDATED)
    In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.  ( 2 min )
    Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. (arXiv:1911.11397v3 [eess.SY] UPDATED)
    This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Karaoker: Alignment-free singing voice synthesis with speech training data. (arXiv:2204.04127v1 [eess.AS])
    Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice following a multi-dimensional template extracted from a source waveform of an unseen speaker/singer. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. Except for multi-tasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.  ( 2 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    GPSAF: A Generalized Probabilistic Surrogate-Assisted Framework for Constrained Single- and Multi-objective Optimization. (arXiv:2204.04054v1 [math.OC])
    Significant effort has been made to solve computationally expensive optimization problems in the past two decades, and various optimization methods incorporating surrogates into optimization have been proposed. Most research focuses on either exploiting the surrogate by defining a utility optimization problem or customizing an existing optimization method to use one or multiple approximation models. However, only a little attention has been paid to generic concepts applicable to different types of algorithms and optimization problems simultaneously. Thus this paper proposes a generalized probabilistic surrogate-assisted framework (GPSAF), applicable to a broad category of unconstrained and constrained, single- and multi-objective optimization algorithms. The idea is based on a surrogate assisting an existing optimization method. The assistance is based on two distinct phases, one facilitating exploration and another exploiting the surrogates. The exploration and exploitation of surrogates are automatically balanced by performing a probabilistic knockout tournament among different clusters of solutions. A study of multiple well-known population-based optimization algorithms is conducted with and without the proposed surrogate assistance on single- and multi-objective optimization problems with a maximum solution evaluation budget of 300 or less. The results indicate the effectiveness of applying GPSAF to an optimization algorithm and the competitiveness with other surrogate-assisted algorithms.  ( 2 min )
    Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection. (arXiv:2204.04042v1 [cs.CL])
    Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage them for model training and development. With this in mind, we explore behaviour-aware learning by examining several fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems. To address potential pitfalls of training on data originally intended for evaluation, we train and evaluate models on different configurations of HateCheck by holding out categories of test cases, which enables us to estimate performance on potentially overlooked system properties. The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups, suggesting that models can potentially generalise to overlooked functionalities. However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class and that the procedure led to overfitting to the HateCheck data distribution.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.
    Text-Aware Predictive Monitoring of Business Processes. (arXiv:2104.09962v2 [cs.AI] CROSS LISTED)
    The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.
    Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations. (arXiv:2201.02571v2 [cs.LG] UPDATED)
    This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved.
    A Manifold View of Adversarial Risk. (arXiv:2203.13277v2 [cs.LG] UPDATED)
    The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.
    An analysis of over-sampling labeled data in semi-supervised learning with FixMatch. (arXiv:2201.00604v2 [cs.LG] UPDATED)
    Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.
    Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. (arXiv:2203.14009v3 [cs.LG] UPDATED)
    Deep neuroevolution and deep Reinforcement Learning have received a lot of attention in the last years. Some works have compared them, highlighting theirs pros and cons, but an emerging trend consists in combining them so as to benefit from the best of both worlds. In this paper, we provide a survey of this emerging trend by organizing the literature into related groups of works and casting all the existing combinations in each group into a generic framework. We systematically cover all easily available papers irrespective of their publication status, focusing on the combination mechanisms rather than on the experimental results. In total, we cover 45 algorithms more recent than 2017. We hope this effort will favor the growth of the domain by facilitating the understanding of the relationships between the methods, leading to deeper analyses, outlining missing useful comparisons and suggesting new combinations of mechanisms.
    Generative Adversarial Method Based On Neural Tangent Kernels. (arXiv:2204.04090v1 [cs.LG])
    The recent development of Generative adversarial networks (GANs) has driven many computer vision applications. Despite the great synthesis quality, training GANs often confronts several issues, including non-convergence, mode collapse, and gradient vanishing. There exist several workarounds, for example, regularizing Lipschitz continuity and adopting Wasserstein distance. Although these methods can partially solve the problems, we argue that the problems are result from modeling the discriminator with deep neural networks. In this paper, we base on newly derived deep neural network theories called Neural Tangent Kernel (NTK) and propose a new generative algorithm called generative adversarial NTK (GA-NTK). The GA-NTK models the discriminator as a Gaussian Process (GP). With the help of the NTK theories, the training dynamics of GA-NTK can be described with a closed-form formula. To synthesize data with the closed-form formula, the objectives can be simplified into a single-level adversarial optimization problem. We conduct extensive experiments on real-world datasets, and the results show that GA-NTK can generate images comparable to those by GANs but is much easier to train under various conditions. We also study the current limitations of GA-NTK and propose some workarounds to make GA-NTK more practical.  ( 2 min )
    Ranking with submodular functions on a budget. (arXiv:2204.04168v1 [cs.DS])
    Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.
    Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures. (arXiv:2112.05224v2 [cs.CR] UPDATED)
    We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their outputs so as to support an adversary-chosen sentiment or point of view -- but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization. Model spinning introduces a "meta-backdoor" into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary. Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims. To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call "pseudo-words," and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary's meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.
    Federated Learning with Adaptive Batchnorm for Personalized Healthcare. (arXiv:2112.00734v2 [cs.LG] UPDATED)
    There is a growing interest in applying machine learning techniques for healthcare. Recently, federated machine learning (FL) is gaining popularity since it allows researchers to train powerful models without compromising data privacy and security. However, the performance of existing FL approaches often deteriorates when encountering non-iid situations where there exist distribution gaps among clients, and few previous efforts focus on personalization in healthcare. In this article, we propose AdaFed to tackle domain shifts and obtain personalized models for local clients. AdaFed learns the similarity between clients via the statistics of the batch normalization layers while preserving the specificity of each client with different local batch normalization. Comprehensive experiments on five healthcare benchmarks demonstrate that AdaFed achieves better accuracy compared to state-of-the-art methods (e.g., \textbf{10}\%+ accuracy improvement for PAMAP2) with faster convergence speed.
    Human Hands as Probes for Interactive Object Understanding. (arXiv:2112.09120v2 [cs.CV] UPDATED)
    Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.
    Image prediction of disease progression by style-based manifold extrapolation. (arXiv:2111.11439v2 [eess.IV] UPDATED)
    Disease-modifying management aims to prevent deterioration and progression of the disease, not just relieve symptoms. Unfortunately, the development of necessary therapies is often hampered by the failure to recognize the presymptomatic disease and limited understanding of disease development. We present a generic solution for this problem by a methodology that allows the prediction of progression risk and morphology in individuals using a latent extrapolation optimization approach. To this end, we combined a regularized generative adversarial network (GAN) and a latent nearest neighbor algorithm for joint optimization to generate plausible images of future time points. We evaluated our method on osteoarthritis (OA) data from a multi-center longitudinal study (the Osteoarthritis Initiative, OAI). With presymptomatic baseline data, our model is generative and significantly outperforms the end-to-end learning model in discriminating the progressive cohort. Two experiments were performed with seven experienced radiologists. When no synthetic follow-up radiographs were provided, our model performed better than all seven radiologists. In cases where the synthetic follow-ups generated by our model were available, the specificity and sensitivity of all readers in discriminating progressors increased from $72.3\%$ to $88.6\%$ and from $42.1\%$ to $51.6\%$, respectively. Our results open up a new possibility of using model-based morphology and risk prediction to make predictions about future disease occurrence, as demonstrated in the example of OA.
    Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition. (arXiv:2110.05267v2 [eess.AS] UPDATED)
    Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility. However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. Experimental results show that the proposed method achieves absolute word error rate (WER) reduction of 4.1% over the best baseline on RATS Channel-A corpus. Our further analysis indicates that the proposed IFF-Net can complement some missing information in the over-suppressed enhanced feature.
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.
    DAD: Data-free Adversarial Defense at Test Time. (arXiv:2204.01568v2 [cs.LG] UPDATED)
    Deep models are highly susceptible to adversarial attacks. Such attacks are carefully crafted imperceptible noises that can fool the network and can cause severe consequences when deployed. To encounter them, the model requires training data for adversarial training or explicit regularization-based techniques. However, privacy has become an important concern, restricting access to only trained models but not the training data (e.g. biometric data). Also, data curation is expensive and companies may have proprietary rights over it. To handle such situations, we propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'. We solve it in two stages: a) detection and b) correction of adversarial samples. Our adversarial sample detection framework is initially trained on arbitrary data and is subsequently adapted to the unlabelled test data through unsupervised domain adaptation. We further correct the predictions on detected adversarial samples by transforming them in Fourier domain and obtaining their low frequency component at our proposed suitable radius for model prediction. We demonstrate the efficacy of our proposed technique via extensive experiments against several adversarial attacks and for different model architectures and datasets. For a non-robust Resnet-18 model pre-trained on CIFAR-10, our detection method correctly identifies 91.42% adversaries. Also, we significantly improve the adversarial accuracy from 0% to 37.37% with a minimal drop of 0.02% in clean accuracy on state-of-the-art 'Auto Attack' without having to retrain the model.
    Measuring disentangled generative spatio-temporal representation. (arXiv:2202.04821v2 [cs.LG] UPDATED)
    Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to spatio-temporal data mining, there has been little attention to further disentangle the latent features and understanding their contribution to the model performance, particularly their mutual information and correlation across features. In this study, we adopt two state-of-the-art disentangled representation learning methods and apply them to three large-scale public spatio-temporal datasets. To evaluate their performance, we propose an internal evaluation metric focusing on the degree of correlations among latent variables of the learned representations and the prediction performance of the downstream tasks. Empirical results show that our modified method can learn disentangled representations that achieve the same level of performance as existing state-of-the-art ST deep learning methods in a spatio-temporal sequence forecasting problem. Additionally, we find that our methods can be used to discover real-world spatial-temporal semantics to describe the variables in the learned representation.
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.
    A Spatial-Temporal Attention Multi-Graph Convolution Network for Ride-Hailing Demand Prediction Based on Periodicity with Offset. (arXiv:2203.12505v2 [cs.LG] UPDATED)
    Ride-hailing service is becoming a leading part in urban transportation. To improve the efficiency of ride-hailing service, accurate prediction of transportation demand is a fundamental challenge. In this paper, we tackle this problem from both aspects of network structure and data-set formulation. For network design, we propose a spatial-temporal attention multi-graph convolution network (STA-MGCN). A spatial-temporal layer in STA-MGCN is developed to capture the temporal correlations by temporal attention mechanism and temporal gate convolution, and the spatial correlations by multigraph convolution. A feature cluster layer is introduced to learn latent regional functions and to reduce the computation burden. For the data-set formulation, we develop a novel approach which considers the transportation feature of periodicity with offset. Instead of only using history data during the same time period, the history order demand in forward and backward neighboring time periods from yesterday and last week are also included. Extensive experiments on the three real-world datasets of New-York, Chicago and Chengdu show that the proposed algorithm achieves the state-of-the-art performance for ride-hailing demand prediction.
    AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning. (arXiv:2110.13005v4 [cs.LG] UPDATED)
    In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters. Since computation is relatively inexpensive on modern GPUs, designing and implementing extremely efficient communication in these parallel training algorithms is critical for extracting the maximum performance. This paper presents AxoNN, a parallel deep learning framework that exploits asynchrony and message-driven execution to schedule neural network operations on each GPU, thereby reducing GPU idle time and maximizing hardware efficiency. By using the CPU memory as a scratch space for offloading data periodically during training, AxoNN is able to reduce GPU memory consumption by four times. This allows us to increase the number of parameters per GPU by four times, thus reducing the amount of communication and increasing performance by over 13%. When tested against large transformer models with 12-100 billion parameters on 48-384 NVIDIA Tesla V100 GPUs, AxoNN achieves a per-GPU throughput of 49.4-54.78% of theoretical peak and reduces the training time by 22-37 days (15-25% speedup) as compared to the state-of-the-art.
    Federated Causal Inference in Heterogeneous Observational Data. (arXiv:2107.11732v3 [cs.LG] UPDATED)
    Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
    Low-Resource Adaptation of Open-Domain Generative Chatbots. (arXiv:2108.06329v2 [cs.CL] UPDATED)
    Recent work building open-domain chatbots has demonstrated that increasing model size improves performance. On the other hand, latency and connectivity considerations dictate the move of digital assistants on the device. Giving a digital assistant like Siri, Alexa, or Google Assistant the ability to discuss just about anything leads to the need for reducing the chatbot model size such that it fits on the user's device. We demonstrate that low parameter models can simultaneously retain their general knowledge conversational abilities while improving in a specific domain. Additionally, we propose a generic framework that accounts for variety in question types, tracks reference throughout multi-turn conversations, and removes inconsistent and potentially toxic responses. Our framework seamlessly transitions between chatting and performing transactional tasks, which will ultimately make interactions with digital assistants more human-like. We evaluate our framework on 1 internal and 4 public benchmark datasets using both automatic (Perplexity) and human (SSA - Sensibleness and Specificity Average) evaluation metrics and establish comparable performance while reducing model parameters by 90%.
    GCA-Net : Utilizing Gated Context Attention for Improving Image Forgery Localization and Detection. (arXiv:2112.04298v3 [cs.CV] UPDATED)
    Forensic analysis of manipulated pixels requires the identification of various hidden and subtle features from images. Conventional image recognition models generally fail at this task because they are biased and more attentive toward the dominant local and spatial features. In this paper, we propose a novel Gated Context Attention Network (GCA-Net) that utilizes non-local attention in conjunction with a gating mechanism in order to capture the finer image discrepancies and better identify forged regions. The proposed framework uses high dimensional embeddings to filter and aggregate the relevant context from coarse feature maps at various stages of the decoding process. This improves the network's understanding of global differences and reduces false-positive localizations. Our evaluation on standard image forensic benchmarks shows that GCA-Net can both compete against and improve over state-of-the-art networks by an average of 4.7% AUC. Additional ablation studies also demonstrate the method's robustness against attributions and resilience to false-positive predictions.
    Generalizing to Unseen Domains: A Survey on Domain Generalization. (arXiv:2103.03097v6 [cs.LG] UPDATED)
    Machine learning systems generally assume that the training and testing distributions are the same. To this end, a key requirement is to develop models that can generalize to unseen distributions. Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increasing interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. Great progress has been made in the area of domain generalization for years. This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. We categorize recent algorithms into three classes: data manipulation, representation learning, and learning strategy, and present several popular algorithms in detail for each category. Third, we introduce the commonly used datasets, applications, and our open-sourced codebase for fair evaluation. Finally, we summarize existing literature and present some potential research topics for the future.
    Group-based Distinctive Image Captioning with Memory Attention. (arXiv:2108.09151v4 [cs.CV] UPDATED)
    Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high accuracy based on popular metrics, i.e., BLEU, CIDEr, and SPICE, the ability of captions to distinguish the target image from other similar images is under-explored. To generate distinctive captions, a few pioneers employ contrastive learning or re-weighted the ground-truth captions, which focuses on one single input image. However, the relationships between objects in a similar image group (e.g., items or properties within the same album or fine-grained events) are neglected. In this paper, we improve the distinctiveness of image captions using a Group-based Distinctive Captioning Model (GdisCap), which compares each image with other images in one similar group and highlights the uniqueness of each image. In particular, we propose a group-based memory attention (GMA) module, which stores object features that are unique among the image group (i.e., with low similarity to objects in other images). These unique object features are highlighted when generating captions, resulting in more distinctive captions. Furthermore, the distinctive words in the ground-truth captions are selected to supervise the language decoder and GMA. Finally, we propose a new evaluation metric, distinctive word rate (DisWordRate) to measure the distinctiveness of captions. Quantitative results indicate that the proposed method significantly improves the distinctiveness of several baseline models, and achieves the state-of-the-art performance on both accuracy and distinctiveness. Results of a user study agree with the quantitative evaluation and demonstrate the rationality of the new metric DisWordRate.
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.
    Omni-Training for Data-Efficient Deep Learning. (arXiv:2110.07510v2 [cs.LG] UPDATED)
    Learning a generalizable deep model from a few examples in a short time remains a major challenge of machine learning, which has impeded its wide deployment to many scenarios. Recent advances reveal that a properly pre-trained model endows an important property: transferability. A higher transferability of the learned representations indicates a better generalizability across domains of different distributions (domain transferability), or across tasks of different semantics (task transferability). Transferability has become the key to enable data-efficient deep learning, however, existing pre-training methods focus only on domain transferability while meta-training methods only on task transferability. This restricts their data-efficiency in downstream scenarios of diverging domains and tasks. A finding of this paper is that even a tight combination of pre-training and meta-training cannot achieve both kinds of transferability. This motivates the proposed Omni-Training framework towards data-efficient deep learning. Our first contribution is Omni-Net, a tri-flow architecture. Besides the joint representation flow, Omni-Net introduces two new parallel flows for pre-training and meta-training, respectively responsible for learning representations of domain transferability and task transferability. Omni-Net coordinates the parallel flows by routing them via the joint-flow, making each gain the other kind of transferability. Our second contribution is Omni-Loss, in which a self-distillation regularization is imposed to enable knowledge transfer across the training process. Omni-Training is a general framework that accommodates many existing pre-training and meta-training algorithms. A thorough evaluation on cross-task and cross-domain datasets in classification, regression and reinforcement learning problems shows that Omni-Training consistently outperforms the state-of-the-art methods.
    Constraints Penalized Q-learning for Safe Offline Reinforcement Learning. (arXiv:2107.09003v3 [cs.LG] UPDATED)
    We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that na\"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
    How to distribute data across tasks for meta-learning?. (arXiv:2103.08463v3 [cs.LG] UPDATED)
    Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks. Interestingly, Neuroscience experiments have shown that human visual skills also transfer better from easy tasks. We prove these results mathematically on mixed linear regression, and we show empirically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet. Our results provide guidance for allocating labels across tasks when collecting data for meta-learning.
    Quantum Machine Learning Framework for Virtual Screening in Drug Discovery: a Prospective Quantum Advantage. (arXiv:2204.04017v1 [quant-ph])
    Machine Learning (ML) for Ligand Based Virtual Screening (LB-VS) is an important in-silico tool for discovering new drugs in a faster and cost-effective manner, especially for emerging diseases such as COVID-19. In this paper, we propose a general-purpose framework combining a classical Support Vector Classifier (SVC) algorithm with quantum kernel estimation for LB-VS on real-world databases, and we argue in favor of its prospective quantum advantage. Indeed, we heuristically prove that our quantum integrated workflow can, at least in some relevant instances, provide a tangible advantage compared to state-of-art classical algorithms operating on the same datasets, showing strong dependence on target and features selection method. Finally, we test our algorithm on IBM Quantum processors using ADRB2 and COVID-19 datasets, showing that hardware simulations provide results in line with the predicted performances and can surpass classical equivalents.
    KCD: Knowledge Walks and Textual Cues Enhanced Political Perspective Detection in News Media. (arXiv:2204.04046v1 [cs.LG])
    Political perspective detection has become an increasingly important task that can help combat echo chambers and political polarization. Previous approaches generally focus on leveraging textual content to identify stances, while they fail to reason with background knowledge or leverage the rich semantic and syntactic textual labels in news articles. In light of these limitations, we propose KCD, a political perspective detection approach to enable multi-hop knowledge reasoning and incorporate textual cues as paragraph-level labels. Specifically, we firstly generate random walks on external knowledge graphs and infuse them with news text representations. We then construct a heterogeneous information network to jointly model news content as well as semantic, syntactic and entity cues in news articles. Finally, we adopt relational graph neural networks for graph-level representation learning and conduct political perspective detection. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on two benchmark datasets. We further examine the effect of knowledge walks and textual cues and how they contribute to our approach's data efficiency.
    Predicting Berth Stay for Tanker Terminals: A Systematic and Dynamic Approach. (arXiv:2204.04085v1 [cs.CE])
    Given the trend of digitization and increasing number of maritime transport, prediction of vessel berth stay has been triggered for requirements of operation research and scheduling optimization problem in the era of maritime big data, which takes a significant part in port efficiency and maritime logistics enhancement. This study proposes a systematic and dynamic approach of predicting berth stay for tanker terminals. The approach covers three innovative aspects: 1) Data source employed is multi-faceted, including cargo operation data from tanker terminals, time-series data from automatic identification system (AIS), etc. 2) The process of berth stay is decomposed into multiple blocks according to data analysis and information extraction innovatively, and practical operation scenarios are also developed accordingly. 3) The predictive models of berth stay are developed on the basis of prior data analysis and information extraction under two methods, including regression and decomposed distribution. The models are evaluated under four dynamic scenarios with certain designated cargoes among two different terminals. The evaluation results show that the proposed approach can predict berth stay with the accuracy up to 98.81% validated by historical baselines, and also demonstrate the proposed approach has dynamic capability of predicting berth stay among the scenarios. The model may be potentially applied for short-term pilot-booking or scheduling optimizations within a reasonable time frame for advancement of port intelligence and logistics efficiency.
    Global Update Guided Federated Learning. (arXiv:2204.03920v1 [cs.LG])
    Federated learning protects data privacy and security by exchanging models instead of data. However, unbalanced data distributions among participating clients compromise the accuracy and convergence speed of federated learning algorithms. To alleviate this problem, unlike previous studies that limit the distance of updates for local models, we propose global-update-guided federated learning (FedGG), which introduces a model-cosine loss into local objective functions, so that local models can fit local data distributions under the guidance of update directions of global models. Furthermore, considering that the update direction of a global model is informative in the early stage of training, we propose adaptive loss weights based on the update distances of local models. Numerical simulations show that, compared with other advanced algorithms, FedGG has a significant improvement on model convergence accuracies and speeds. Additionally, compared with traditional fixed loss weights, adaptive loss weights enable our algorithm to be more stable and easier to implement in practice.
    Ontology Matching Through Absolute Orientation of Embedding Spaces. (arXiv:2204.04040v1 [cs.AI])
    Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based mapping approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evaluation using synthetic and real-world datasets. We find in experiments with synthetic data, that the approach works very well on similarly structured graphs; it handles alignment noise better than size and structural differences in the ontologies.
    Self-supervised Speaker Diarization. (arXiv:2204.04166v1 [cs.SD])
    Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised deep-learning model for speaker diarization. Specifically, the study focuses on generating high-quality neural speaker representations without any annotated data, as well as on estimating secondary hyperparameters of the model without annotations. The speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker. The trained encoder model is then used to self-generate pseudo-labels to subsequently train a similarity score between different segments of the same call using probabilistic linear discriminant analysis (PLDA) and further to learn a clustering stopping threshold. We compared our model to state-of-the-art unsupervised as well as supervised baselines on the CallHome benchmarks. According to empirical results, our approach outperforms unsupervised methods when only two speakers are present in the call, and is only slightly worse than recent supervised models.
    C-NMT: A Collaborative Inference Framework for Neural Machine Translation. (arXiv:2204.04043v1 [cs.LG])
    Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.
    Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings. (arXiv:2204.04063v1 [cs.CV])
    One intriguing property of adversarial attacks is their "transferability" -- an adversarial example crafted with respect to one deep neural network (DNN) model is often found effective against other DNNs as well. Intensive research has been conducted on this phenomenon under simplistic controlled conditions. Yet, thus far, there is still a lack of comprehensive understanding about transferability-based attacks ("transfer attacks") in real-world environments. To bridge this critical gap, we conduct the first large-scale systematic empirical study of transfer attacks against major cloud-based MLaaS platforms, taking the components of a real transfer attack into account. The study leads to a number of interesting findings which are inconsistent to the existing ones, including: (1) Simple surrogates do not necessarily improve real transfer attacks. (2) No dominant surrogate architecture is found in real transfer attacks. (3) It is the gap between posterior (output of the softmax layer) rather than the gap between logit (so-called $\kappa$ value) that increases transferability. Moreover, by comparing with prior works, we demonstrate that transfer attacks possess many previously unknown properties in real-world environments, such as (1) Model similarity is not a well-defined concept. (2) $L_2$ norm of perturbation can generate high transferability without usage of gradient and is a more powerful source than $L_\infty$ norm. We believe this work sheds light on the vulnerabilities of popular MLaaS platforms and points to a few promising research directions.
    EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector. (arXiv:2204.04154v1 [cs.CR])
    Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 2. (arXiv:2204.03955v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing weighted average turnaround time. The proposed approach is developed as a heuristic algorithm applied and investigated through different observation windows with weekly rolling horizon paradigm method. The experimental results show that the proposed approach is effective and promising on mitigating the turnaround time of vessels. The results demonstrate that largest potential savings of turnaround time (weighted average) are around 17 hours (28%) reduction on baseline of 1-week observation, 45 hours (37%) reduction on baseline of 2-week observation and 70 hours (40%) reduction on baseline of 3-week observation. Even though the experimental results are based on historical datasets, the results potentially present significant benefits if real-time applications were applied under a quadratic computational complexity.
    Labeling-Free Comparison Testing of Deep Learning Models. (arXiv:2204.03994v1 [cs.LG])
    Various deep neural networks (DNNs) are developed and reported for their tremendous success in multiple domains. Given a specific task, developers can collect massive DNNs from public sources for efficient reusing and avoid redundant work from scratch. However, testing the performance (e.g., accuracy and robustness) of multiple DNNs and giving a reasonable recommendation that which model should be used is challenging regarding the scarcity of labeled data and demand of domain expertise. Existing testing approaches are mainly selection-based where after sampling, a few of the test data are labeled to discriminate DNNs. Therefore, due to the randomness of sampling, the performance ranking is not deterministic. In this paper, we propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness. The main idea is to learn a Bayesian model to infer the models' specialty only based on predicted labels. To evaluate the effectiveness of our approach, we undertook exhaustive experiments on 9 benchmark datasets spanning in the domains of image, text, and source code, and 165 DNNs. In addition to accuracy, we consider the robustness against synthetic and natural distribution shifts. The experimental results demonstrate that the performance of existing approaches degrades under distribution shifts. Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $\tau$, respectively, regardless of the dataset and distribution shift. Additionally, we investigated the impact of model quality (accuracy and robustness) and diversity (standard deviation of the quality) on the testing effectiveness and observe that there is a higher chance of a good result when the quality is over 50\% and the diversity is larger than 18\%.  ( 2 min )
    ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation. (arXiv:2204.03992v1 [cs.LG])
    Electrocardiograms (ECGs) have shown unique patterns to distinguish between different subjects and present important advantages compared to other biometric traits, such as difficulty to counterfeit, liveness detection, and ubiquity. Also, with the success of Deep Learning technologies, ECG biometric recognition has received increasing interest in recent years. However, it is not easy to evaluate the improvements of novel ECG proposed methods, mainly due to the lack of public data and standard experimental protocols. In this study, we perform extensive analysis and comparison of different scenarios in ECG biometric recognition. Both verification and identification tasks are investigated, as well as single- and multi-session scenarios. Finally, we also perform single- and multi-lead ECG experiments, considering traditional scenarios using electrodes in the chest and limbs and current user-friendly wearable devices. In addition, we present ECGXtractor, a robust Deep Learning technology trained with an in-house large-scale database and able to operate successfully across various scenarios and multiple databases. We introduce our proposed feature extractor, trained with multiple sinus-rhythm heartbeats belonging to 55,967 subjects, and provide a general public benchmark evaluation with detailed experimental protocol. We evaluate the system performance over four different databases: i) our in-house database, ii) PTB, iii) ECG-ID, and iv) CYBHi. With the widely used PTB database, we achieve Equal Error Rates of 0.14% and 2.06% in verification, and accuracies of 100% and 96.46% in identification, respectively in single- and multi-session analysis. We release the source code, experimental protocol details, and pre-trained models in GitHub to advance in the field.  ( 2 min )
    Does Robustness on ImageNet Transfer to Downstream Tasks?. (arXiv:2204.03934v1 [cs.CV])
    As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these techniques often target models trained on ImageNet classification. At the same time, it is a common practice to use ImageNet pretrained backbones for downstream tasks such as object detection, semantic segmentation, and image classification from different domains. This raises a question: Can these robust image classifiers transfer robustness to downstream tasks? For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet. For CIFAR10 classification, we find that models that are robustified for ImageNet do not retain robustness when fully fine-tuned. These findings suggest that current robustification techniques tend to emphasize ImageNet evaluations. Moreover, network architecture is a strong source of robustness when we consider transfer learning.  ( 2 min )
    SnapMode: An Intelligent and Distributed Large-Scale Fashion Image Retrieval Platform Based On Big Data and Deep Generative Adversarial Network Technologies. (arXiv:2204.03998v1 [cs.IR])
    Fashion is now among the largest industries worldwide, for it represents human history and helps tell the worlds story. As a result of the Fourth Industrial Revolution, the Internet has become an increasingly important source of fashion information. However, with a growing number of web pages and social data, it is nearly impossible for humans to manually catch up with the ongoing evolution and the continuously variable content in this domain. The proper management and exploitation of big data can pave the way for the substantial growth of the global economy as well as citizen satisfaction. Therefore, computer scientists have found it challenging to handle e-commerce fashion websites by using big data and machine learning technologies. This paper first proposes a scalable focused Web Crawler engine based on the distributed computing platforms to extract and process fashion data on e-commerce websites. The role of the proposed platform is then described in developing a disentangled feature extraction method by employing deep convolutional generative adversarial networks (DCGANs) for content-based image indexing and retrieval. Finally, the state-of-the-art solutions are compared, and the results of the proposed approach are analyzed on a standard dataset. For the real-life implementation of the proposed solution, a Web-based application is developed on Apache Storm, Kafka, Solr, and Milvus platforms to create a fashion search engine called SnapMode.  ( 2 min )
    Channel model for end-to-end learning of communications systems: A survey. (arXiv:2204.03944v1 [cs.LG])
    The traditional communication model based on chain of multiple independent processing blocks is constraint to efficiency and introduces artificial barriers. Thus, each individually optimized block does not guarantee end-to-end performance of the system. Recently, end-to-end learning of communications systems through machine learning (ML) have been proposed to optimize the system metrics jointly over all components. These methods show performance improvements but has a limitation that it requires a differentiable channel model. In this study, we have summarized the existing approaches that alleviates this problem. We believe that this study will provide better understanding of the topic and an insight into future research in this field.  ( 2 min )
    Mel-spectrogram features for acoustic vehicle detection and speed estimation. (arXiv:2204.04013v1 [cs.LG])
    The paper addresses acoustic vehicle detection and speed estimation from single sensor measurements. We predict the vehicle's pass-by instant by minimizing clipped vehicle-to-microphone distance, which is predicted from the mel-spectrogram of input audio, in a supervised learning approach. In addition, mel-spectrogram-based features are used directly for vehicle speed estimation, without introducing any intermediate features. The results show that the proposed features can be used for accurate vehicle detection and speed estimation, with an average error of 7.87 km/h. If we formulate speed estimation as a classification problem, with a 10 km/h discretization interval, the proposed method attains the average accuracy of 48.7% for correct class prediction and 91.0% when an offset of one class is allowed. The proposed method is evaluated on a dataset of 304 urban-environment on-field recordings of ten different vehicles.  ( 2 min )
    The Complexity of Markov Equilibrium in Stochastic Games. (arXiv:2204.03991v1 [cs.LG])
    We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.  ( 2 min )
    DiversiTree: Computing Diverse Sets of Near-Optimal Solutions to Mixed-Integer Optimization Problems. (arXiv:2204.03822v1 [cs.DM])
    While most methods for solving mixed-integer optimization problems seek a single optimal solution, finding a diverse set of near-optimal solutions can often be more useful. State of the art methods for generating diverse near-optimal solutions usually take a two-phase approach, first finding a set of near-optimal solutions and then finding a diverse subset. In contrast, we present a method of finding a set of diverse solutions by emphasizing diversity within the search for near-optimal solutions. Specifically, within a branch-and-bound framework, we investigate parameterized node selection rules that explicitly consider diversity. Our results indicate that our approach significantly increases diversity of the final solution set. When compared with existing methods for finding diverse near-optimal sets, our method runs with similar run-time as regular node selection methods and gives a diversity improvement of up to 140%. In contrast, popular node selection rules such as best-first search gives an improvement of no more than 40%. Further, we find that our method is most effective when diversity is emphasized more in node selection when deeper in the tree and when the solution set has grown large enough.  ( 2 min )
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 1. (arXiv:2204.03899v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing average wait time and turnaround time. The proposed approach consists of enhanced particle swarm optimization (ePSO) as kernel and augmented firefly algorithm (AFA) as global optimal search. Two paradigm methods of the proposed approach are investigated, which are batch method and rolling horizon method. The experimental results show that both paradigm methods of proposed approach can effectively enhance port efficiency. The average wait time could be significantly reduced by 86.0% - 95.5%, and the average turnaround time could eventually save 38.2% - 42.4% with respect to historical benchmarks. Moreover, the paradigm method of rolling horizon could reduce to 20 mins on running time over 3-month datasets, rather than 4 hrs on batch method at corresponding maximum performance.
    Network Shuffling: Privacy Amplification via Random Walks. (arXiv:2204.03919v1 [cs.CR])
    Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, introducing a centralized entity to the originally local privacy model loses some appeals of not having any centralized entity as in local differential privacy. Moreover, implementing a shuffler in a reliable way is not trivial due to known security issues and/or requirements of advanced hardware or secure computation technology. Motivated by these practical considerations, we rethink the shuffle model to relax the assumption of requiring a centralized, trusted shuffler. We introduce network shuffling, a decentralized mechanism where users exchange data in a random-walk fashion on a network/graph, as an alternative of achieving privacy amplification via anonymity. We analyze the threat model under such a setting, and propose distributed protocols of network shuffling that is straightforward to implement in practice. Furthermore, we show that the privacy amplification rate is similar to other privacy amplification techniques such as uniform shuffling. To our best knowledge, among the recently studied intermediate trust models that leverage privacy amplification techniques, our work is the first that is not relying on any centralized entity to achieve privacy amplification.  ( 2 min )
    Study of a committee of neural networks for biometric hand-geometry recognition. (arXiv:2204.03935v1 [cs.CV])
    This Paper studies different committees of neural networks for biometric pattern recognition. We use the neural nets as classifiers for identification and verification purposes. We show that a committee of nets can improve the recognition rates when compared with a multi-start initialization algo-rithm that just picks up the neural net which offers the best performance. On the other hand, we found that there is no strong correlation between identifi-cation and verification applications using the same classifier.  ( 2 min )
    Disability prediction in multiple sclerosis using performance outcome measures and demographic data. (arXiv:2204.03969v1 [cs.LG])
    Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.  ( 2 min )
    KGI: An Integrated Framework for Knowledge Intensive Language Tasks. (arXiv:2204.03985v1 [cs.CL])
    In a recent work, we presented a novel state-of-the-art approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. In this paper, we propose a system based on an enhanced version of this approach where we train task specific models for other knowledge intensive language tasks, such as open domain question answering (QA), dialogue and fact checking. Our system achieves results comparable to the best models in the KILT leaderboards. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine each other. Particularly, we show how accuracy in dialogue can be improved using the QA model. A short video demonstrating the system is available here - \url{https://ibm.box.com/v/kgi-interactive-demo} .  ( 2 min )
    Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation. (arXiv:2204.03872v1 [cs.LG])
    Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though optimal measurement policy is actually dependent on the goal of measurement, we mainly focus on retrieving complete data, so called as imputation. Also, we adapt the imputation method to missingness varying with measurement policy. However, learning measurement policy and imputation requires complete data which is impossible to be observed, unfortunately. To tackle this problem, we propose a data generation method and joint learning algorithm. The main idea is that 1) the data generation method is inherited by imputation method, and 2) the adaptation of imputation encourages measurement policy to learn more than individual learning. We implemented some variations of proposed algorithm for two different datasets and various missing rates. From the experimental results, we demonstrate that our algorithm is generally applicable and outperforms baseline methods.  ( 2 min )
    CD$^2$-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning. (arXiv:2204.03880v1 [cs.CV])
    Federated learning (FL) is a distributed learning paradigm that enables multiple clients to collaboratively learn a shared global model. Despite the recent progress, it remains challenging to deal with heterogeneous data clients, as the discrepant data distributions usually prevent the global model from delivering good generalization ability on each participating client. In this paper, we propose CD^2-pFed, a novel Cyclic Distillation-guided Channel Decoupling framework, to personalize the global model in FL, under various settings of data heterogeneity. Different from previous works which establish layer-wise personalization to overcome the non-IID data across different clients, we make the first attempt at channel-wise assignment for model personalization, referred to as channel decoupling. To further facilitate the collaboration between private and shared weights, we propose a novel cyclic distillation scheme to impose a consistent regularization between the local and global model representations during the federation. Guided by the cyclical distillation, our channel decoupling framework can deliver more accurate and generalized results for different kinds of heterogeneity, such as feature skew, label distribution skew, and concept shift. Comprehensive experiments on four benchmarks, including natural image and medical image analysis tasks, demonstrate the consistent effectiveness of our method on both local and external validations.  ( 2 min )
    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment. (arXiv:2204.04016v1 [eess.AS])
    Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.  ( 2 min )
    Exploring the Universality of Hadronic Jet Classification. (arXiv:2204.03812v1 [hep-ph])
    The modeling of jet substructure significantly differs between Parton Shower Monte Carlo (PSMC) programs. Despite this, we observe that machine learning classifiers trained on different PSMCs learn nearly the same function. This means that when these classifiers are applied to the same PSMC for testing, they result in nearly the same performance. This classifier universality indicates that a machine learning model trained on one simulation and tested on another simulation (or data) will likely be optimal. Our observations are based on detailed studies of shallow and deep neural networks applied to simulated Lorentz boosted Higgs jet tagging at the LHC.  ( 2 min )
    SuperNet in Neural Architecture Search: A Taxonomic Survey. (arXiv:2204.03916v1 [cs.CV])
    Deep Neural Networks (DNN) have made significant progress in a wide range of visual recognition tasks such as image classification, object detection, and semantic segmentation. The evolution of convolutional architectures has led to better performance by incurring expensive computational costs. In addition, network design has become a difficult task, which is labor-intensive and requires a high level of domain knowledge. To mitigate such issues, there have been studies for a variety of neural architecture search methods that automatically search for optimal architectures, achieving models with impressive performance that outperform human-designed counterparts. This survey aims to provide an overview of existing works in this field of research and specifically focus on the supernet optimization that builds a neural network that assembles all the architectures as its sub models by using weight sharing. We aim to accomplish that by categorizing supernet optimization by proposing them as solutions to the common challenges found in the literature: data-side optimization, poor rank correlation alleviation, and transferable NAS for a number of deployment scenarios.  ( 2 min )
    Data-Driven Evaluation of Training Action Space for Reinforcement Learning. (arXiv:2204.03840v1 [cs.LG])
    Training action space selection for reinforcement learning (RL) is conflict-prone due to complex state-action relationships. To address this challenge, this paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation to avoid unnecessary explorations. The effectiveness of the methodology is illustrated using a cloud infrastructure resource tuning case study. It reduces the search space by 80\% and categorizes the training action sets into dispensable and indispensable groups. Additionally, it ranks different training actions to facilitate high-performance yet cost-efficient RL model design. The proposed data-driven methodology is extensible to different domains, use cases, and reinforcement learning algorithms.  ( 2 min )
    Engagement Detection with Multi-Task Training in E-Learning Environments. (arXiv:2204.04020v1 [cs.CV])
    Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-of-the-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.
    Blockchain as an Enabler for Transfer Learning in Smart Environments. (arXiv:2204.03959v1 [cs.AI])
    The knowledge, embodied in machine learning models for intelligent systems, is commonly associated with time-consuming and costly processes such as large-scale data collection, data labelling, network training, and fine-tuning of models. Sharing and reuse of these elaborated models between intelligent systems deployed in a different environment, which is known as transfer learning, would facilitate the adoption of services for the users and accelerates the uptake of intelligent systems in environments such as smart building and smart city applications. In this context, the communication and knowledge exchange between AI-enabled environments depend on a complicated networks of systems, system of systems, digital assets, and their chain of dependencies that hardly follows the centralized schema of traditional information systems. Rather, it requires an adaptive decentralized system architecture that is empowered by features such as data provenance, workflow transparency, and validation of process participants. In this research, we propose a decentralized and adaptive software framework based on blockchain and knowledge graph technologies that supports the knowledge exchange and interoperability between IoT-enabled environments, in a transparent and trustworthy way.  ( 2 min )
    Does the Market of Citations Reward Reproducible Work?. (arXiv:2204.03829v1 [cs.DL])
    The field of bibliometrics, studying citations and behavior, is critical to the discussion of reproducibility. Citations are one of the primary incentive and reward systems for academic work, and so we desire to know if this incentive rewards reproducible work. Yet to the best of our knowledge, only one work has attempted to look at this combined space, concluding that non-reproducible work is more highly cited. We show that answering this question is more challenging than first proposed, and subtle issues can inhibit a robust conclusion. To make inferences with more robust behavior, we propose a hierarchical Bayesian model that incorporates the citation rate over time, rather than the total number of citations after a fixed amount of time. In doing so we show that, under current evidence the answer is more likely that certain fields of study such as Medicine and Machine Learning (ML) do correlate reproducible works with more citations, but other fields appear to have no relationship. Further, we find that making code available and thoroughly referencing prior works appear to also positively correlate with increased citations. Our code and data can be found at https://github.com/EdwardRaff/ReproducibleCitations .  ( 2 min )
    A posteriori learning for quasi-geostrophic turbulence parametrization. (arXiv:2204.03911v1 [physics.flu-dyn])
    The use of machine learning to build subgrid parametrizations for climate models is receiving growing attention. State-of-the-art strategies address the problem as a supervised learning task and optimize algorithms that predict subgrid fluxes based on information from coarse resolution models. In practice, training data are generated from higher resolution numerical simulations transformed in order to mimic coarse resolution simulations. By essence, these strategies optimize subgrid parametrizations to meet so-called $\textit{a priori}$ criteria. But the actual purpose of a subgrid parametrization is to obtain good performance in terms of $\textit{a posteriori}$ metrics which imply computing entire model trajectories. In this paper, we focus on the representation of energy backscatter in two dimensional quasi-geostrophic turbulence and compare parametrizations obtained with different learning strategies at fixed computational complexity. We show that strategies based on $\textit{a priori}$ criteria yield parametrizations that tend to be unstable in direct simulations and describe how subgrid parametrizations can alternatively be trained end-to-end in order to meet $\textit{a posteriori}$ criteria. We illustrate that end-to-end learning strategies yield parametrizations that outperform known empirical and data-driven schemes in terms of performance, stability and ability to apply to different flow configurations. These results support the relevance of differentiable programming paradigms for climate models in the future.  ( 2 min )
    Federated Learning with Partial Model Personalization. (arXiv:2204.03809v1 [cs.LG])
    We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm.  ( 2 min )
    Decomposition-based Generation Process for Instance-Dependent Partial Label Learning. (arXiv:2204.03845v1 [cs.LG])
    Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior(MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Quantum version of the k-NN classifier based on a quantum sorting algorithm. (arXiv:2204.03761v1 [quant-ph])
    In this work we introduce a quantum sorting algorithm with adaptable requirements of memory and circuit depth, and then use it to develop a new quantum version of the classical machine learning algorithm known as k-nearest neighbors (k-NN). Both the efficiency and performance of this new quantum version of the k-NN algorithm are compared to those of the classical k-NN and another quantum version proposed by Schuld et al. \cite{Int13}. Results show that the efficiency of both quantum algorithms is similar to each other and superior to that of the classical algorithm. On the other hand, the performance of our proposed quantum k-NN algorithm is superior to the one proposed by Schuld et al. and similar to that of the classical k-NN.  ( 2 min )
    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition. (arXiv:2204.03793v1 [eess.AS])
    Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although previous proof-of-concept studies have validated the effectiveness of Personal VAD, there are still several critical challenges to address before this model can be used in production: first, the quality must be satisfactory in both enrollment and enrollment-less scenarios; second, it should operate in a streaming fashion; and finally, the model size should be small enough to fit a limited latency and CPU/Memory budget. To meet the multi-faceted requirements, we propose a series of novel designs: 1) advanced speaker embedding modulation methods; 2) a new training paradigm to generalize to enrollment-less conditions; 3) architecture and runtime optimizations for latency and resource restrictions. Extensive experiments on a realistic speech recognition system demonstrated the state-of-the-art performance of our proposed method.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    A Learnable Variational Model for Joint Multimodal MRI Reconstruction and Synthesis. (arXiv:2204.03804v1 [eess.IV])
    Generating multi-contrasts/modal MRI of the same anatomy enriches diagnostic information but is limited in practice due to excessive data acquisition time. In this paper, we propose a novel deep-learning model for joint reconstruction and synthesis of multi-modal MRI using incomplete k-space data of several source modalities as inputs. The output of our model includes reconstructed images of the source modalities and high-quality image synthesized in the target modality. Our proposed model is formulated as a variational problem that leverages several learnable modality-specific feature extractors and a multimodal synthesis module. We propose a learnable optimization algorithm to solve this model, which induces a multi-phase network whose parameters can be trained using multi-modal MRI data. Moreover, a bilevel-optimization framework is employed for robust parameter training. We demonstrate the effectiveness of our approach using extensive numerical experiments.  ( 2 min )
    A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. (arXiv:2204.03719v1 [cs.LG])
    Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose the first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.  ( 2 min )
    Decentralized Event-Triggered Federated Learning with Heterogeneous Communication Thresholds. (arXiv:2204.03726v1 [cs.LG])
    A recent emphasis of distributed learning research has been on federated learning (FL), in which model training is conducted by the data-collecting devices. Existing research on FL has mostly focused on a star topology learning architecture with synchronized (time-triggered) model training rounds, where the local models of the devices are periodically aggregated by a centralized coordinating node. However, in many settings, such a coordinating node may not exist, motivating efforts to fully decentralize FL. In this work, we propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over the network graph topology. We consider heterogeneous communication event thresholds at each device that weigh the change in local model parameters against the available local resources in deciding the benefit of aggregations at each iteration. Through theoretical analysis, we demonstrate that our methodology achieves asymptotic convergence to the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature, and without restrictive connectivity requirements on the underlying topology. Subsequent numerical results demonstrate that our methodology obtains substantial improvements in communication requirements compared with FL baselines.  ( 2 min )
    Global ECG Classification by Self-Operational Neural Networks with Feature Injection. (arXiv:2204.03768v1 [cs.LG])
    Objective: Global (inter-patient) ECG classification for arrhythmia detection over Electrocardiogram (ECG) signal is a challenging task for both humans and machines. The main reason is the significant variations of both normal and arrhythmic ECG patterns among patients. Automating this process with utmost accuracy is, therefore, highly desirable due to the advent of wearable ECG sensors. However, even with numerous deep learning approaches proposed recently, there is still a notable gap in the performance of global and patient-specific ECG classification performances. This study proposes a novel approach to narrow this gap and propose a real-time solution with shallow and compact 1D Self-Organized Operational Neural Networks (Self-ONNs). Methods: In this study, we propose a novel approach for inter-patient ECG classification using a compact 1D Self-ONN by exploiting morphological and timing information in heart cycles. We used 1D Self-ONN layers to automatically learn morphological representations from ECG data, enabling us to capture the shape of the ECG waveform around the R peaks. We further inject temporal features based on RR interval for timing characterization. The classification layers can thus benefit from both temporal and learned features for the final arrhythmia classification. Results: Using the MIT-BIH arrhythmia benchmark database, the proposed method achieves the highest classification performance ever achieved, i.e., 99.21% precision, 99.10% recall, and 99.15% F1-score for normal (N) segments; 82.19% precision, 82.50% recall, and 82.34% F1-score for the supra-ventricular ectopic beat (SVEBs); and finally, 94.41% precision, 96.10% recall, and 95.2% F1-score for the ventricular-ectopic beats (VEBs).  ( 2 min )
    Automated Design of Salient Object Detection Algorithms with Brain Programming. (arXiv:2204.03722v1 [cs.CV])
    Despite recent improvements in computer vision, artificial visual systems' design is still daunting since an explanation of visual computing algorithms remains elusive. Salient object detection is one problem that is still open due to the difficulty of understanding the brain's inner workings. Progress on this research area follows the traditional path of hand-made designs using neuroscience knowledge. In recent years two different approaches based on genetic programming appear to enhance their technique. One follows the idea of combining previous hand-made methods through genetic programming and fuzzy logic. The other approach consists of improving the inner computational structures of basic hand-made models through artificial evolution. This research work proposes expanding the artificial dorsal stream using a recent proposal to solve salient object detection problems. This approach uses the benefits of the two main aspects of this research area: fixation prediction and detection of salient objects. We decided to apply the fusion of visual saliency and image segmentation algorithms as a template. The proposed methodology discovers several critical structures in the template through artificial evolution. We present results on a benchmark designed by experts with outstanding results in comparison with the state-of-the-art.  ( 2 min )
    Compositional Generalization and Decomposition in Neural Program Synthesis. (arXiv:2204.03758v1 [cs.LG])
    When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what we can measure is whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize, e.g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data. Based on this characterization, we introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets, SCAN and RobustFill. Finally, we make first attempts to improve the compositional generalization ability of Transformer models along these axes through novel attention mechanisms that draw inspiration from a human-like decomposition strategy. Empirically, we find our modified Transformer models generally perform better than natural baselines, but the tasks remain challenging.  ( 2 min )
    Physics-assisted Generative Adversarial Network for X-Ray Tomography. (arXiv:2204.03703v1 [eess.IV])
    X-ray tomography is capable of imaging the interior of objects in three dimensions non-invasively, with applications in biomedical imaging, materials science, electronic inspection, and other fields. The reconstruction process can be an ill-conditioned inverse problem, requiring regularization to obtain satisfactory reconstructions. Recently, deep learning has been adopted for tomographic reconstruction. Unlike iterative algorithms which require a distribution that is known a priori, deep reconstruction networks can learn a prior distribution through sampling the training distributions. In this work, we develop a Physics-assisted Generative Adversarial Network (PGAN), a two-step algorithm for tomographic reconstruction. In contrast to previous efforts, our PGAN utilizes maximum-likelihood estimates derived from the measurements to regularize the reconstruction with both known physics and the learned prior. Synthetic objects with spatial correlations are integrated circuits (IC) from a proposed model CircuitFaker. Compared with maximum-likelihood estimation, PGAN can reduce the photon requirement with limited projection angles to achieve a given error rate. We further attribute the improvement to the learned prior by reconstructing objects created without spatial correlations. The advantages of using a prior from deep learning in X-ray tomography may further enable low-photon nanoscale imaging.  ( 2 min )
    GreaseVision: Rewriting the Rules of the Interface. (arXiv:2204.03731v1 [cs.HC])
    Digital harms can manifest across any interface. Key problems in addressing these harms include the high individuality of harms and the fast-changing nature of digital systems. As a result, we still lack a systematic approach to study harms and produce interventions for end-users. We put forward GreaseVision, a new framework that enables end-users to collaboratively develop interventions against harms in software using a no-code approach and recent advances in few-shot machine learning. The contribution of the framework and tool allow individual end-users to study their usage history and create personalized interventions. Our contribution also enables researchers to study the distribution of harms and interventions at scale.  ( 2 min )
    Mixing Signals: Data Augmentation Approach for Deep Learning Based Modulation Recognition. (arXiv:2204.03737v1 [eess.SP])
    With the rapid development of deep learning, automatic modulation recognition (AMR), as an important task in cognitive radio, has gradually transformed from traditional feature extraction and classification to automatic classification by deep learning technology. However, deep learning models are data-driven methods, which often require a large amount of data as the training support. Data augmentation, as the strategy of expanding dataset, can improve the generalization of the deep learning models and thus improve the accuracy of the models to a certain extent. In this paper, for AMR of radio signals, we propose a data augmentation strategy based on mixing signals and consider four specific methods (Random Mixing, Maximum-Similarity-Mixing, $\theta-$Similarity Mixing and n-times Random Mixing) to achieve data augmentation. Experiments show that our proposed method can improve the classification accuracy of deep learning based AMR models in the full public dataset RML2016.10a. In particular, for the case of a single signal-to-noise ratio signal set, the classification accuracy can be significantly improved, which verifies the effectiveness of the methods.  ( 2 min )
    A Kernel Method to Nonlinear Location Estimation with RSS-based Fingerprint. (arXiv:2204.03724v1 [cs.NI])
    This paper presents a nonlinear location estimation to infer the position of a user holding a smartphone. We consider a large location with $M$ number of grid points, each grid point is labeled with a unique fingerprint consisting of the received signal strength (RSS) values measured from $N$ number of Bluetooth Low Energy (BLE) beacons. Given the fingerprint observed by the smartphone, the user's current location can be estimated by finding the top-k similar fingerprints from the list of fingerprints registered in the database. Besides the environmental factors, the dynamicity in holding the smartphone is another source to the variation in fingerprint measurements, yet there are not many studies addressing the fingerprint variability due to dynamic smartphone positions held by human hands during online detection. To this end, we propose a nonlinear location estimation using the kernel method. Specifically, our proposed method comprises of two steps: 1) a beacon selection strategy to select a subset of beacons that is insensitive to the subtle change of holding positions, and 2) a kernel method to compute the similarity between this subset of observed signals and all the fingerprints registered in the database. The experimental results based on large-scale data collected in a complex building indicate a substantial performance gain of our proposed approach in comparison to state-of-the-art methods. The dataset consisting of the signal information collected from the beacons is available online.  ( 2 min )
    Introducing a Framework and a Decision Protocol to Calibrate Recommender Systems. (arXiv:2204.03706v1 [cs.IR])
    Recommender Systems use the user's profile to generate a recommendation list with unknown items to a target user. Although the primary goal of traditional recommendation systems is to deliver the most relevant items, such an effort unintentionally can cause collateral effects including low diversity and unbalanced genres or categories, benefiting particular groups of categories. This paper proposes an approach to create recommendation lists with a calibrated balance of genres, avoiding disproportion between the user's profile interests and the recommendation list. The calibrated recommendations consider concomitantly the relevance and the divergence between the genres distributions extracted from the user's preference and the recommendation list. The main claim is that calibration can contribute positively to generate fairer recommendations. In particular, we propose a new trade-off equation, which considers the users' bias to provide a recommendation list that seeks for the users' tendencies. Moreover, we propose a conceptual framework and a decision protocol to generate more than one thousand combinations of calibrated systems in order to find the best combination. We compare our approach against state-of-the-art approaches using multiple domain datasets, which are analyzed by rank and calibration metrics. The results indicate that the trade-off, which considers the users' bias, produces positive effects on the precision and to the fairness, thus generating recommendation lists that respect the genre distribution and, through the decision protocol, we also found the best system for each dataset.  ( 2 min )
    TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates. (arXiv:2204.03671v1 [cs.CV])
    We propose a novel approach to generate temporally coherent UV coordinates for loose clothing. Our method is not constrained by human body outlines and can capture loose garments and hair. We implemented a differentiable pipeline to learn UV mapping between a sequence of RGB inputs and textures via UV coordinates. Instead of treating the UV coordinates of each frame separately, our data generation approach connects all UV coordinates via feature matching for temporal stability. Subsequently, a generative model is trained to balance the spatial quality and temporal stability. It is driven by supervised and unsupervised losses in both UV and image spaces. Our experiments show that the trained models output high-quality UV coordinates and generalize to new poses. Once a sequence of UV coordinates has been inferred by our model, it can be used to flexibly synthesize new looks and modified visual styles. Compared to existing methods, our approach reduces the computational workload to animate new outfits by several orders of magnitude.  ( 2 min )
    T4PdM: a Deep Neural Network based on the Transformer Architecture for Fault Diagnosis of Rotating Machinery. (arXiv:2204.03725v1 [cs.AI])
    Deep learning and big data algorithms have become widely used in industrial applications to optimize several tasks in many complex systems. Particularly, deep learning model for diagnosing and prognosing machinery health has leveraged predictive maintenance (PdM) to be more accurate and reliable in decision making, in this way avoiding unnecessary interventions, machinery accidents, and environment catastrophes. Recently, Transformer Neural Networks have gained notoriety and have been increasingly the favorite choice for Natural Language Processing (NLP) tasks. Thus, given their recent major achievements in NLP, this paper proposes the development of an automatic fault classifier model for predictive maintenance based on a modified version of the Transformer architecture, namely T4PdM, to identify multiple types of faults in rotating machinery. Experimental results are developed and presented for the MaFaulDa and CWRU databases. T4PdM was able to achieve an overall accuracy of 99.98% and 98% for both datasets, respectively. In addition, the performance of the proposed model is compared to other previously published works. It has demonstrated the superiority of the model in detecting and classifying faults in rotating industrial machinery. Therefore, the proposed Transformer-based model can improve the performance of machinery fault analysis and diagnostic processes and leverage companies to a new era of the Industry 4.0. In addition, this methodology can be adapted to any other task of time series classification.  ( 2 min )
    Brain-Inspired Hyperdimensional Computing: How Thermal-Friendly for Edge Computing?. (arXiv:2204.03739v1 [cs.ET])
    Brain-inspired hyperdimensional computing (HDC) is an emerging machine learning (ML) methods. It is based on large vectors of binary or bipolar symbols and a few simple mathematical operations. The promise of HDC is a highly efficient implementation for embedded systems like wearables. While fast implementations have been presented, other constraints have not been considered for edge computing. In this work, we aim at answering how thermal-friendly HDC for edge computing is. Devices like smartwatches, smart glasses, or even mobile systems have a restrictive cooling budget due to their limited volume. Although HDC operations are simple, the vectors are large, resulting in a high number of CPU operations and thus a heavy load on the entire system potentially causing temperature violations. In this work, the impact of HDC on the chip's temperature is investigated for the first time. We measure the temperature and power consumption of a commercial embedded system and compare HDC with conventional CNN. We reveal that HDC causes up to 6.8{\deg}C higher temperatures and leads to up to 47% more CPU throttling. Even when both HDC and CNN aim for the same throughput (i.e., perform a similar number of classifications per second), HDC still causes higher on-chip temperatures due to the larger power consumption.  ( 2 min )
    Using Multiple Self-Supervised Tasks Improves Model Robustness. (arXiv:2204.03714v1 [cs.CV])
    Deep networks achieve state-of-the-art performance on computer vision tasks, yet they fail under adversarial attacks that are imperceptible to humans. In this paper, we propose a novel defense that can dynamically adapt the input using the intrinsic structure from multiple self-supervised tasks. By simultaneously using many self-supervised tasks, our defense avoids over-fitting the adapted image to one specific self-supervised task and restores more intrinsic structure in the image compared to a single self-supervised task approach. Our approach further improves robustness and clean accuracy significantly compared to the state-of-the-art single task self-supervised defense. Our work is the first to connect multiple self-supervised tasks to robustness, and suggests that we can achieve better robustness with more intrinsic signal from visual data.  ( 2 min )
    BankNote-Net: Open dataset for assistive universal currency recognition. (arXiv:2204.03738v1 [cs.CV])
    Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image recognition. However, the datasets and models available for this task are limited, both in terms of dataset size and in variety of currencies covered. In this work, we collect a total of 24,826 images of banknotes in variety of assistive settings, spanning 17 currencies and 112 denominations. Using supervised contrastive learning, we develop a machine learning model for universal currency recognition. This model learns compliant embeddings of banknote images in a variety of contexts, which can be shared publicly (as a compressed vector representation), and can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft. We share our encoder model and the embeddings as an open dataset in our BankNote-Net repository.  ( 2 min )
    Learning to Walk Autonomously via Reset-Free Quality-Diversity. (arXiv:2204.03655v1 [cs.LG])
    Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.  ( 2 min )
    Identification of Autism spectrum disorder based on a novel feature selection method and Variational Autoencoder. (arXiv:2204.03654v1 [eess.IV])
    The development of noninvasive brain imaging such as resting-state functional magnetic resonance imaging (rs-fMRI) and its combination with AI algorithm provides a promising solution for the early diagnosis of Autism spectrum disorder (ASD). However, the performance of the current ASD classification based on rs-fMRI still needs to be improved. This paper introduces a classification framework to aid ASD diagnosis based on rs-fMRI. In the framework, we proposed a novel filter feature selection method based on the difference between step distribution curves (DSDC) to select remarkable functional connectivities (FCs) and utilized a multilayer perceptron (MLP) which was pretrained by a simplified Variational Autoencoder (VAE) for classification. We also designed a pipeline consisting of a normalization procedure and a modified hyperbolic tangent (tanh) activation function to replace the original tanh function, further improving the model accuracy. Our model was evaluated by 10 times 10-fold cross-validation and achieved an average accuracy of 78.12%, outperforming the state-of-the-art methods reported on the same dataset. Given the importance of sensitivity and specificity in disease diagnosis, two constraints were designed in our model which can improve the model's sensitivity and specificity by up to 9.32% and 10.21%, respectively. The added constraints allow our model to handle different application scenarios and can be used broadly.  ( 2 min )
    Qade: Solving Differential Equations on Quantum Annealers. (arXiv:2204.03657v1 [quant-ph])
    We present a general method, called Qade, for solving differential equations using a quantum annealer. The solution is obtained as a linear combination of a set of basis functions. On current devices, Qade can solve systems of coupled partial differential equations that depend linearly on the solution and its derivatives, with non-linear variable coefficients and arbitrary inhomogeneous terms. We test the method with several examples and find that state-of-the-art quantum annealers can find the solution accurately for problems requiring a small enough function basis. We provide a Python package implementing the method at gitlab.com/jccriado/qade.  ( 2 min )
    Adaptive-Gravity: A Defense Against Adversarial Samples. (arXiv:2204.03694v1 [cs.LG])
    This paper presents a novel model training solution, denoted as Adaptive-Gravity, for enhancing the robustness of deep neural network classifiers against adversarial examples. We conceptualize the model parameters/features associated with each class as a mass characterized by its centroid location and the spread (standard deviation of the distance) of features around the centroid. We use the centroid associated with each cluster to derive an anti-gravity force that pushes the centroids of different classes away from one another during network training. Then we customized an objective function that aims to concentrate each class's features toward their corresponding new centroid, which has been obtained by anti-gravity force. This methodology results in a larger separation between different masses and reduces the spread of features around each centroid. As a result, the samples are pushed away from the space that adversarial examples could be mapped to, effectively increasing the degree of perturbation needed for making an adversarial example. We have implemented this training solution as an iterative method consisting of four steps at each iteration: 1) centroid extraction, 2) anti-gravity force calculation, 3) centroid relocation, and 4) gravity training. Gravity's efficiency is evaluated by measuring the corresponding fooling rates against various attack models, including FGSM, MIM, BIM, and PGD using LeNet and ResNet110 networks, benchmarked against MNIST and CIFAR10 classification problems. Test results show that Gravity not only functions as a powerful instrument to robustify a model against state-of-the-art adversarial attacks but also effectively improves the model training accuracy.  ( 2 min )
  • Open

    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.  ( 2 min )
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.  ( 3 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.  ( 2 min )
    Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts. (arXiv:2009.06170v5 [stat.ME] UPDATED)
    We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational regimes that depend on the sparsity level of the graph. Furthermore, even in more challenging regimes, we prove that our bootstrap procedure offers valid coverage and vanishing confidence intervals. For the quadratic bootstrap, we establish an Edgeworth expansion and show that this procedure offers higher-order accuracy under appropriate sparsity conditions. We complement our theoretical results with a simulation study and real data analysis and verify that our procedure offers state-of-the-art performance for several functionals.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    Handling highly correlated genes in prediction analysis of genomic studies. (arXiv:2007.02455v4 [stat.AP] UPDATED)
    Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction model is still not well addressed. High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. Furthermore, when a causal gene (whose variants have an actual biological effect on a phenotype) is highly correlated with other genes, most algorithms select the feature gene from the correlated group in a purely data-driven manner. Since the correlation structure among genes could change substantially when condition changes, the prediction model based on not correctly selected feature genes is unreliable. Therefore, we aim to keep the causal biological signal in the prediction process and build a more robust prediction model. Method: We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection. Our novel grouping algorithm can be integrated into existing prediction algorithms to enhance their prediction performance. Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change. Second, it reports whole correlated gene groups as discovered biomarkers for prediction tasks, allowing researchers to conduct follow-up studies to identify causal genes within the identified groups. Result: Using real benchmark scRNA-seq datasets with simulated cell phenotypes, we demonstrate our novel method significantly outperforms standard models in both (1) prediction of cell phenotypes and (2) feature gene selection.  ( 2 min )
    Seeded graph matching for the correlated Wigner model via the projected power method. (arXiv:2204.04099v1 [math.ST])
    In the graph matching problem we observe two graphs $G,H$ and the goal is to find an assignment (or matching) between their vertices such that some measure of edge agreement is maximized. We assume in this work that the observed pair $G,H$ has been drawn from the correlated Wigner model -- a popular model for correlated weighted graphs -- where the entries of the adjacency matrices of $G$ and $H$ are independent Gaussians and each edge of $G$ is correlated with one edge of $H$ (determined by the unknown matching) with the edge correlation described by a parameter $\sigma\in [0,1)$. In this paper, we analyse the performance of the projected power method (PPM) as a seeded graph matching algorithm where we are given an initial partially correct matching (called the seed) as side information. We prove that if the seed is close enough to the ground-truth matching, then with high probability, PPM iteratively improves the seed and recovers the ground-truth matching (either partially or exactly) in $\mathcal{O}(\log n)$ iterations. Our results prove that PPM works even in regimes of constant $\sigma$, thus extending the analysis in (Mao et al.,2021) for the sparse Erd\"os-Renyi model to the (dense) Wigner model. As a byproduct of our analysis, we see that the PPM framework generalizes some of the state-of-art algorithms for seeded graph matching. We support and complement our theoretical findings with numerical experiments on synthetic data.  ( 2 min )

  • Open

    Trippy AI Dream 30 - Howl's Moving Castle Post-Apocalyptic War Scenes VQ...
    submitted by /u/LordPewPew777 [link] [comments]
    Is there a AI which can turn images into simple versions of the original image?
    So that the wrinkles and shadows are removed, etc. submitted by /u/xXNOdrugsForMEXx [link] [comments]
    The Singularity is Now
    submitted by /u/ManandMultiverse [link] [comments]
    AI Graphics: Design your dream body with a slider
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Does AI exist that takes an image of a real person and edit/generate photos of them?
    submitted by /u/NootropicLove [link] [comments]
  • Open

    Circular slide rule
    I explained the basics of how a slide rule works in the previous post. But how does a circular slide rule work? Apparently the prop Mr. Spock is holding is an E6B aircraft slide rule. It includes a circular slide rule and more functionality. Start with an ordinary straight slide rule, with each bar labeled […] Circular slide rule first appeared on John D. Cook.  ( 2 min )
    Why a slide rule works
    Suppose you have two sticks. The length of one is log x, and the length of the other is log y. If you put the two sticks end to end, the combined length is log x + log y = log xy. That’s the basic idea behind a slide rule. The simplest slide rule consists […] Why a slide rule works first appeared on John D. Cook.  ( 2 min )
  • Open

    [D] Machine Learning - WAYR (What Are You Reading) - Week 135
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 1…  ( 1 min )
    [N]: Dall-E 2 Explained
    submitted by /u/giugiacaglia [link] [comments]  ( 1 min )
    [R] Use static classifiers for dynamic point cloud tasks (3D) and use action classifiers for temporal anomaly detection (2D) - Link to a free online lecture by the author in comments
    submitted by /u/pinter69 [link] [comments]  ( 1 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [P] image similarity metrics or algorithms
    I want to perform image similarity between images from frames of 2 different movie trailers. I am currently using SSIM and VGG 16 individually. But ssim does not capture color differences and VGG 16 isn't capturing structural integrity. I can use both together, but I wanted to know if there is any metric or algorithm which can capture both together with less discrepancies and can capture both together. Will appreciate any help. Thank you! submitted by /u/terminatorash2199 [link] [comments]  ( 1 min )
    [R] Interested in a Postdoctoral position bridging machine learning and neuroimaging?
    Recurrent neural networks for brain time series - Sano Centre for Computational Personalised Medicine submitted by /u/alecrimi [link] [comments]
    [R] ML with Intermediate Mathematics: VAEs with Normalized Flow (live series)
    Hi everyone, I'd like to share with you an exciting upcoming live series by Prof. Richard Xu of Hong Kong Baptist University. If you're interested, please click here to register! Description: "I have been planning to start a machine learning live series on topics that involve some intermediate mathematics, so I can help you to clarify some concepts. In order to fully grasp these concepts, you need to have sound knowledge of linear algebra, calculus, statistics and probability. However, if you just want to come and hear it for fun, please do so as well! The first topic is variational autoencoders with normalized flow, which I'll fully explain its beautiful mathematics over a period of a few sessions. You can find my notes on my GitHub site: https://github.com/roboticcam/machine-learning-notes/blob/master/files/vb_nf.pdf I will post the Zoom link to the registered participants. Please join us!" submitted by /u/ML_Live_Series [link] [comments]  ( 1 min )
    [research] Issues visualising a Resnet
    Hi all, We have a model used in cardiac MRi imaging, it is used to select the best image in a series of images. It consists of images -> Resnet -> LSTM -> output. The heatmap we generate from the Resnet alone shows output like the image attached, instead of actual anatomy, there is only little squares. We think this is likely due to the residual in the Resnet because it is not present in a VGG, but does anyone else have a better explanation, and an idea of how to visualise a Resnet? Resnet Saliency Map submitted by /u/Radiology_AI [link] [comments]  ( 1 min )
    [R] Critical analysis of deconfounded pretraining to improve visio-linguistic models
    Hi reddit, happy to share our new paper "Critical analysis of deconfounded pretraining to improve visio-linguistic models". In a nutshell, it's on the problem of out-of-distribution performance for visio-linguistic models, and it takes a closer look / surfaces some issues with an existing technique for improving OOD performance by doing automatic deconfounding (inspired by the causality framework of Structural Causal Models). ​ Paper: https://www.frontiersin.org/articles/10.3389/frai.2022.736791/full Code: https://github.com/Natithan/p1_causality Abstract: An important problem with many current visio-linguistic models is that they often depend on spurious correlations. A typical example of a spurious correlation between two variables is one that is due to a third variable causing …  ( 2 min )
    Best Papers That Solve Novel Problems? [D]
    We often talk about how the publish-or-perish paradigm leads to constant minor improvements on the same problems (Image classification, text generation, etc). What are some of the best papers that do the opposite? Rather than solving known problems in a marginally better way, they solve a new problem with known (or modified) methods. submitted by /u/SuspiciousWalrus99 [link] [comments]  ( 2 min )
    [Discussion] Advice on training document layout analysis models
    So, a bit of background, I am doing an RnD project in the area of improving the layout analysis of scientific documents. The proposed method is to use an active learning loop on standard object detection models and target those classes/layouts which are performing poorly and train the model based on them. Now, we have some selection strategies based on submodular selection functions to target the pages we want. I also have set up code to extract embeddings which will help me do the selection. But I don't have prior experience in active learning, especially setting it up with detectron2, because it seems to register a dataset to train, and it is really difficult to change dynamically in the middle of training, which is my use case. So I need some advice on the following:- The document analysis datasets are huge, DocBank is 50GB of images alone. How can I effectively store the embeddings in memory when I will call my selection algorithms mentioned above? How to set up an active learning loop in detectron2 for object detection. Or are there any alternatives? Some resources/code would be better There is some literature evidence suggesting that simple CNN backbone embeddings represent an image better than FasterRCNN or MaskRCNN embeddings. Specifically, this paper seems to be working on spliced image retrieval and it claims the following. Any thoughts/prior experience on this? Finally, is there any evidence supporting improvement in accuracy/precision in object detection using active learning? Or are there some better training paradigms? Thank you for your patience. submitted by /u/ExoticAd6868 [link] [comments]  ( 1 min )
    [D] Market Basket Analysis real-world examples and insights?
    I want to know more about Market Basket Analysis's real-world use case and unique insights/business value derived from performing the Association rule mining. I heard about the Beer-Diaper case study but many other sources invalidated it as a spurious correlation. Can someone share any example of business insights from Market Basket analysis and any interesting patterns they were able to observe?? submitted by /u/invincible_moron [link] [comments]  ( 2 min )
    [D] How are multiple training examples used in DMD, SINDy, etc.?
    The examples I have seen so far for DMD and SINDy use only 1 trajectory of the dynamical system for training. The input data is a 2D matrix, with the states/features being one dimension and time being the other dimension. But I want to use multiple trajectories of the same dynamical system for training, so the training data would be 3D (i.e., multiple 2D matrices). Are there examples where this has been done? Linear regression techniques (like pseudoinverse or LASSO) seem to be used to get the system matrix (in DMD) or the weights for the features (in SINDy). Can these methods be extended to 3D input data? submitted by /u/baigyaanik [link] [comments]  ( 1 min )
  • Open

    6 Business Applications that Badly Need Better AI
    The success and growth of AI is undeniable. Yet there are still basic tasks performing poorly, despite or because of automation. In some cases, you can blame reliance on outdated AI. In other cases, it is a result of corporate policies or multiple AI systems that compete against each other. The AI systems in question… Read More »6 Business Applications that Badly Need Better AI The post 6 Business Applications that Badly Need Better AI appeared first on Data Science Central.  ( 7 min )
    How exactly do you define Artificial Intelligence(AI)?
    How exactly do you define Artificial Intelligence(AI)? This looks like a back to basics/ back to school question – but the answer is not that simple Recently I was trying to find a good academic definition of AI for a research paper. Surprisingly, its not easy. In this post, I present a good definition for… Read More »How exactly do you define Artificial Intelligence(AI)? The post How exactly do you define Artificial Intelligence(AI)? appeared first on Data Science Central.  ( 2 min )
    Public wary of Meta’s Metaverse vision
    There are signs that Meta’s plans for the metaverse is faltering, including plummeting stock prices and the company’s announcement that it may withdraw from the EU market. The troubles stem from a myriad of issues, the most significant of which are data collection privacy issues and a lack of investor and public confidence in the… Read More »Public wary of Meta’s Metaverse vision The post Public wary of Meta’s Metaverse vision appeared first on Data Science Central.  ( 4 min )
    NLQ: Why You Might Not Need To Call A Data Analyst Anymore
    NLQ or Natural Language Query or Text-to-SQL or NL2SQL is an arm of computational linguistics, that helps users to fetch required data, visualizations, and insights, from sentences written in human language. As a business user, knowing data schema, table and column names, knowing metadata, having technical know-how of a BI tool or data querying skills… Read More »NLQ: Why You Might Not Need To Call A Data Analyst Anymore The post NLQ: Why You Might Not Need To Call A Data Analyst Anymore appeared first on Data Science Central.  ( 4 min )
    Advance in your finance and accounting careers with top technical skills
    Profession in finance and accounting is one of the top career choices for finance and accounting professionals. Employment of accountants and auditors is projected to grow 7 percent from the year 2020 to the year 2030, about as fast as the average for all occupations. About 135,000 openings for accountants and auditors are projected each year,… Read More »Advance in your finance and accounting careers with top technical skills The post Advance in your finance and accounting careers with top technical skills appeared first on Data Science Central.  ( 3 min )
  • Open

    Anybody ever programmed a 1st order differential equation model in MuJoCo?
    submitted by /u/SmarterCloud [link] [comments]  ( 1 min )
    Google AI Researchers Propose a Meta-Algorithm, Jump Start Reinforcement Learning, That Uses Prior Policies to Create a Learning Curriculum That Improves Performance
    In the field of artificial intelligence, reinforcement learning is a type of machine-learning strategy that rewards desirable behaviors while penalizing those which aren’t. An agent can perceive its surroundings and act accordingly through trial and error in general with this form or presence – it’s kind of like getting feedback on what works for you. However, learning rules from scratch in contexts with complex exploration problems is a big challenge in RL. Because the agent does not receive any intermediate incentives, it cannot determine how close it is to complete the goal. As a result, exploring the space at random becomes necessary until the door opens. Given the length of the task and the level of precision required, this is highly unlikely. Exploring the state space randomly with preliminary information should be avoided while performing this activity. This prior knowledge aids the agent in determining which states of the environment are desirable and should be investigated further. Offline data collected by human demonstrations, programmed policies, or other RL agents could be used to train a policy and then initiate a new RL policy. This would include copying the pre-trained policy’s neural network to the new RL policy in the scenario where we utilize neural networks to describe the procedures. This process transforms the new RL policy into a pre-trained one. However, as seen below, naively initializing a new RL policy like this frequently fails, especially for value-based RL approaches. Continue reading the summary Paper: https://arxiv.org/pdf/2204.02372.pdf Project: https://jumpstart-rl.github.io/ https://reddit.com/link/u0n5hv/video/fnktgf0wqqs81/player submitted by /u/No_Coffee_4638 [link] [comments]  ( 2 min )
    "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", Zeng et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Classical Dynamic Programming ve Policy Iteration
    Cracking my head trying to figure out the differences between classical dynamic programming and policy iteration. I understand that policy iteration in itself is a form of dynamic programming. But if we were to compare the traditional operations of dynamic programming with policy iteration, what would be the differences. Thank you Heaps submitted by /u/BalramVeeragoo [link] [comments]  ( 1 min )
    Is my understanding to why future rewards being considered correct?
    To my understanding, the Q value is updated like this: Q[s,a] = Q[s,a] + lr * (reward + gamma * max(Q[s,a]t+1) — Q[s,a]) Where future state reward is considered since the best current reward doesn't grantee the optimal path. E.g.: Path A: Q[s,a]t = 1, Q[s,a]t+1 = 10 Total: 11 Path B: Q[s,a]t = 5, Q[s,a]t+1 = 1 Total: 6 Not sure if this is a good analogy but here it gives the result that A is the optimal path even though its immediate reward at (t) is less than B. Further Question: Is there additional benefits to considering future rewards further than Q[s,a]t+1 ? Example: Q[s,a] = Q[s,a] + lr * (reward + gamma* max(Q[s,a]t+2) - gamma * max(Q[s,a]t+1) — Q[s,a]) submitted by /u/DangerNoodle314 [link] [comments]  ( 1 min )
    Any reason why to use several optimizers in Pytorch implementation of REDQ?
    Hi guys. I am currently implementing REDQ by modifying a working implementation of SAC (basically adapted from Spinup) and so far my implementation doesn't work, I am trying to understand why. By looking at the authors' implementation I notice they use 1 pytorch optimizer per Q network, whereas I only use 1 for all parameters. So I wonder, is there any good reason for using several optimizers here? Thanks! submitted by /u/yannbouteiller [link] [comments]  ( 1 min )
    Learning in Noisy Observation space
    I am fairly new to RL. I'm trying to train an agent (like in gym's Cartpole env) in an environment with noisy (Gaussian noise) observation. I have added Gaussian noise to angle only (not to cart position, cart velocity or ang_velocity). I was fooling around with PPO in stable_baselines but haven't had much luck. Any suggestions on what needs to be tweaked or any good algorithm for this task? Also, I tried changing the default forcing magnitude of 10 to other values like 30 but it didn't help much. Thanks submitted by /u/Black_Beard53 [link] [comments]  ( 2 min )
  • Open

    Summer school in between neuroimaging and machine learning
    submitted by /u/pasticciociccio [link] [comments]
    Dall-e and Dall-e2
    I have been into the website of OpenAI, it is unclear what has been added to Dall-e2, why should we subscribe for a github which will be made public? And what is the color code at the bottom? submitted by /u/pasticciociccio [link] [comments]  ( 1 min )
    Researchers, Including Yann Lecun, Propose ‘projUNN’: An Efficient Method For Training Deep Neural Networks With Unitary Matrices
    When deep networks or inputs involve extensive data sequences, learning in neural networks can be unstable. Recurrent states in vanilla recurrent neural networks (RNNs) are generated by repeatedly applying a linear transformation followed by a pointwise nonlinearity. This becomes unstable when the linear transformation’s eigenvalues are not of magnitude one. Unitary matrices have been utilized to solve the problem of disappearing and exploding gradients because they have eigenvalues of size one, naturally, and have been. Unitary convolutional layers have recently been developed in a similar way to aid in the development of more stable deep networks with norm-preserving transformations. The loss function’s derivative with respect to the weights is called a gradient. During backpropagation in neural networks, it is utilized to update the weights to minimize the loss function. When traveled backward with each layer, the derivative or slope continuously grows lower, resulting in a vanishing gradient. When the weight update is exponentially small, the training time is excessively long. In the worst-case scenario, the neural network training may be stopped entirely. Exploding gradients, on the other hand, occur when the slope increases with each successive layer during backpropagation. The gradient will never converge due to the high weights, causing it to oscillate around the minima without ever reaching a global minima point. Continue Reading Paper: https://arxiv.org/pdf/2203.05483.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Ran 3D art of my AI character thru ArcaneGAN; AI making art of AI.
    submitted by /u/alex-redacted [link] [comments]
    New Technology, Old Problems: The Missing Voices in Natural Language Processing
    submitted by /u/regalalgorithm [link] [comments]
    Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks
    https://preview.redd.it/pkrbloq8vjs81.png?width=1422&format=png&auto=webp&s=fef693165a6c948f626de613e4e341c25f8cf5f4 ​ Extreme-scale language models have recently exhibited incredible performance on natural language processing challenges. This is due to their ever-increasing size, exceeding 500 billion parameters. However, while these models have grown in popularity in recent years, the amount of data utilized to train them has not increased. The current generation of huge language models is clearly undertrained. Three prediction approaches for optimally choosing both model size and training length have been proposed by a DeepMind research team. Three approaches have been mentioned to estimate the optimal parameter: Change the size of the models and the number of training tokens. IsoFLOP profiles Using a parametric loss function to fit a model The ultimate pretraining loss is calculated as the number of model parameters and training tokens. They minimize the loss function under the restriction of the FLOPs function, which is equal to the computational budget because the computational budget is a probabilistic function of the number of observed training tokens and model parameters. Continue Reading This Research Summary Paper: https://arxiv.org/pdf/2203.15556.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Flaming Rose art made with snowpixelapp using AI.
    submitted by /u/AIWORQART [link] [comments]
    How do you start a professional career in the Affective Computing field?
    I'm about to graduate with a master's degree in Computer Science and I'm very passionate about Affective Computing. I would like to start looking for a job in this field, but most companies (not consulting) are looking for people with experience or a PhD. What do you recommend me to do? Continue with the PhD or try to find something, maybe in some startup? submitted by /u/_rikya_ [link] [comments]  ( 1 min )
    Deep learning to enable color vision in the dark
    submitted by /u/qptbook [link] [comments]
    Laptop for beginner?
    I'm joining MSc AI & ML this September. I want to buy a laptop. Is MacBook Air sufficient for this? If not what would you recommend to someone like me? submitted by /u/RauhanSheikh [link] [comments]  ( 1 min )
    How can I help the advancement of AI? I want to contribute and make this my career. What should I do?
    Please give a thorough and in-depth response. submitted by /u/trillswan [link] [comments]  ( 1 min )
  • Open

    Gizmo is eating a clothes basket
    submitted by /u/mspurplekris [link] [comments]
    "Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale", Ramrakhya et al 2022 {FB} (log-scaling of crowdsourced imitation learning in VR robotics)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Is it possible to implement ACER with A2C?
    I'm looking into implementing a replay buffer in A2C. I came upon the ACER [paper](https://arxiv.org/pdf/1611.01224.pdf). From my understanding, it looks like ACER is an extension of A3C, and it seems like the difference between A2C and A3C is that in A2C parameters are updated synchronously and that helps with big batch sizes. Is it still possible to implement some kind of replay buffer on A2C? Are there any papers that involve implementing a paper with A2C that you recommend I read? I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 1 min )
    Does anyone have a link to 'The RL Discord Server'
    Supposedly there is a popular discord server for the RL community, however I am having difficulty finding it. submitted by /u/jclaessens [link] [comments]
    "Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning", Qi et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Reinforcement Learning - looking for some resources
    Hello friends, I'm looking for some resources that would let me quickly start with Reinforcement Learning (preferably in Python). I have some experience with supervised learning (e.g. deep nets) and would like to complement with some RL. Preferably a walkthrough with some examples of implementation. Can you recommend something? Thanks in advance! submitted by /u/andy-codes [link] [comments]  ( 1 min )
    I'm dumb at maths: what does this mean?
    So learning about e-mail learning same have it all understood except for the max thingy. If you care enough to click this blog it's not my blog): https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56 I don't know how to turn this into to a real example: Update q values Q[state, action] = Q[state, action] + lr * (reward + gamma * np.max(Q[new_state, :]) — Q[state, action]) Specifically the last bit: np.max(Q[new_state, :]) — Q[state, action]) What does the numpy max actually operate on here? Any hard examples? Thanks. submitted by /u/Togfox [link] [comments]  ( 2 min )
  • Open

    [R] Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
    #cvpr-2022 Happy to share our CVPR-2022 paper "Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning" Paper: https://arxiv.org/pdf/2111.14213.pdf Code: https://github.com/mmendiet/FedAlign Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. submitted by /u/Extension-Sun1816 [link] [comments]  ( 1 min )
    [D] ICML2022 Domain conflicts system
    I was wondering if the Domain conflicts system working well? I got an email from the PCs and seems that the domain conflict system is not working well. It said that we can enter the conflict now but I cannot edit the conflict in the system. Could anyone tell me how to do it? Thanks! Dear ICML Authors, As we are seeing this happen, we just wanted to send you a brief explanation -- this only applies to some few papers. A few papers are losing reviews because of newly arising conflicts. If you did not enter your conflicts in CMT during the submission phase (as requested via CMT), then they could not be used in paper assignments. If you enter them now, any reviews by reviewers with conflicting domains will disappear, and you may see fewer reviews as a result. Unfortunately, we have no control over this, as the conflicts should have been entered when the paper was submitted. Best, Stefanie, Le and Csaba submitted by /u/Snoo_97274 [link] [comments]  ( 1 min )
    [D] Poll: How do you deploy models & endpoints?
    View Poll submitted by /u/martolini [link] [comments]  ( 1 min )
    [R][P] Generate images from text with Latent Diffusion LAION-400M Model + Gradio Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [P] tinydl - library to help with hyperparameter search and metric reporting in pytorch
    Hi everyone, I built a small library to help with hyperparameter search for deep learning models created with pytorch, because I got kinda tired of having to rewrite large pieces code over and over again. You can check it out here: https://github.com/michi-jeremias/tinydl or you can even install it with pip (pip install tinydl). I have included a readme and an example of how the library can be used. About the library, it's pretty flexible about reporting different metrics to the console and to tensorboard (add_scalar, add_hparam) at each stage of the process, like after a batch, epoch of after a whole run over multiple epochs. It can also be easily extended to include other metrics or new types of outputs. Since this is basically my first attempt at a software project that's not intended only to be used by myself I'd be happy about any feedback you have for me! If the project doesn't qualify to be posted here due to being too simple/too much on a beginner level, apologies for that. submitted by /u/abacaxiquaxi [link] [comments]  ( 1 min )
    [D] StyleGAN2 Path Length Regularization Implementation Clarification
    I am trying to implement stylegan2 and there are so many things here that are not explained either well, or at all in the paper. ​ How exactly is path length regularization implemented? In this PT code we can see that the $|J^T_w.y|$ is computed as follows: ​ def g_path_regularize(fake_img, latents, mean_path_length, decay=0.01): noise = torch.randn_like(fake_img) / math.sqrt( fake_img.shape[2] * fake_img.shape[3] ) grad, = autograd.grad( outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True ) path_lengths = torch.sqrt(grad.pow(2).sum(2).mean(1)) path_mean = mean_path_length + decay * (path_lengths.mean() - mean_path_length) path_penalty = (path_lengths - path_mean).pow(2).mean() return path_penalty, path_mean.detach(), path_lengths This is based on this official TF implementation. The problem I have is that from what I understand, fake_img is 4D, and latents is 2D. The grad output in this case will be 2D and grad.pow(2).sum(2) cannot be computed because the third axis does not exist. Obviously people who are using these repos have not reported any issue regarding mismatch of shapes and axes, so I believe there is something else going on. Since I'm trying to implement this in my own network, I cannot get the desired shape any how. I get a 2D gradient output. submitted by /u/feryet [link] [comments]  ( 1 min )
    [P] Jax/Haiku pretrained models: MobileNet, ResNet, VGG, Xception.
    I released a repository of models with optional pretrained weights(Weights are taken from TF/Keras) to be used for tasks like prediction, feature extraction and fine-tuning. Github: https://github.com/abarcel/haikumodels Currently Available Models MobileNet ResNet [50, 101, 152] VGG [16, 19] Xception Also planning to release more, as soon as I find time for it. submitted by /u/abarcel [link] [comments]
    [D] Denoising in the latent space
    I spent some time reading about and playing around with speech denoising DNNs ~2019. At the time the popular architecture was U-Net (encoder -> bottleneck -> decoder with skip connections) operating on spectrograms. These U-Nets were trained directly on noisy/clean speech pairs and the loss was the difference between the predicted denoised images and actual denoised image. MSE between the predicted/actual images was a baseline loss but people alse added "feature loss" or sometimes a GAN-based loss function as well. Anyway a cursory reading of the DALL-E 2 paper has me thinking about that approach. I'm curious to know if a similar approach used for DALL-E has been tried for audio denoising: pre-train an encoder/decoder in a self-supervised fashion on a large dataset of audio train a denoiser to operate only in the latent space (ie the most compressed representation that is passed from the encoder to the decoder) step 1 - self-supervised training of encoder/decoder https://preview.redd.it/nprc40ob9gs81.png?width=1668&format=png&auto=webp&s=3a9b181ceff6c4530b5f41abf793dfb6409c0ec2 step 2 - train denoiser in latent space only ​ https://preview.redd.it/hreajdpe9gs81.png?width=1279&format=png&auto=webp&s=e16490fff00fe82487ca214b11b642ffcb30fb1c step 3 - do inference by feeding denoised latent space vector into the decoder https://preview.redd.it/7nox4ezh9gs81.png?width=2034&format=png&auto=webp&s=ccc69bbb2096d84a9e4a824000e62bb0f80fbe29 Is this a common approach already? It seems like once you have a good pretrained encoder/decoder pair then the denoiser training would be much more efficient than training an entire network that does everything at once from scratch (smaller search space, faster training loop) submitted by /u/The_Amp_Walrus [link] [comments]  ( 1 min )
    [Discussion] MLOps vs Platform Engineering
    Hey guys, I have the opportunity to either move to the platform engineering team or the freshly created MLOps team within my company. I'm interested in both careers, as I like to build Infra. I'm currently a Data Eng, and I find myself to like building apps and enabling applications to talk to each other, better than cleaning up data. I worked as a data scientist before, but I didn't like the science. I was always into engineering. What would make sense from a career perspective (both long and short term), i.e., money, promotions, attractiveness, etc. submitted by /u/dash2392 [link] [comments]  ( 1 min )
  • Open

    Distributed Reinforcement Learning for Robot Teams: A Review. (arXiv:2204.03516v1 [cs.RO])
    Purpose of review: Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation. Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the "centralized training, decentralized execution" paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning. Summary: This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.  ( 2 min )
    Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery. (arXiv:2106.02190v5 [cs.LG] UPDATED)
    We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems. (arXiv:2204.03132v1 [math.OC])
    We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs) in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rate of solution procedures have been studied in this setting, the analysis of iteration complexity is still in its infancy. Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method, respectively, with an accelerated mirror-prox algorithm as the inner loop. We provide nonasymptotic theoretical guarantees for these algorithms. More specifically, we establish the global convergence rate of our algorithms for solving (strongly) monotone NGNEPs and we provide iteration complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms.  ( 2 min )
    DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores. (arXiv:2204.03219v1 [eess.AS])
    Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.  ( 2 min )
    Unsupervised Image-to-Image Translation with Generative Prior. (arXiv:2204.03641v1 [cs.CV])
    Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. Despite the recent progress in image translation models, it remains challenging to build mappings between complex domains with drastic visual discrepancies. In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm. Our key insight is to leverage the generative prior from pre-trained class-conditional GANs (e.g., BigGAN) to learn rich content correspondences across various domains. We propose a novel coarse-to-fine scheme: we first distill the generative prior to capture a robust coarse-level content representation that can link objects at an abstract semantic level, based on which fine-level content features are adaptively learned for more accurate multi-level content correspondences. Extensive experiments demonstrate the superiority of our versatile framework over state-of-the-art methods in robust, high-quality and diversified translations, even for challenging and distant domains.  ( 2 min )
    SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. (arXiv:2204.03040v1 [cs.SD])
    In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task. Baseline results of state-of-the-art MOS prediction models on the SOMOS dataset are presented, while we show the challenges that such models face when assigned to evaluate synthetic utterances.  ( 2 min )
    Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection---Extended Version. (arXiv:2204.03341v1 [cs.LG])
    Time series data occurs widely, and outlier detection is a fundamental problem in data mining, which has numerous applications. Existing autoencoder-based approaches deliver state-of-the-art performance on challenging real-world data but are vulnerable to outliers and exhibit low explainability. To address these two limitations, we propose robust and explainable unsupervised autoencoder frameworks that decompose an input time series into a clean time series and an outlier time series using autoencoders. Improved explainability is achieved because clean time series are better explained with easy-to-understand patterns such as trends and periodicities. We provide insight into this by means of a post-hoc explainability analysis and empirical studies. In addition, since outliers are separated from clean time series iteratively, our approach offers improved robustness to outliers, which in turn improves accuracy. We evaluate our approach on five real-world datasets and report improvements over the state-of-the-art approaches in terms of robustness and explainability. This is an extended version of "Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection", to appear in IEEE ICDE 2022.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning. (arXiv:2107.06336v2 [cs.LG] UPDATED)
    The Shapley value (SV) and Least core (LC) are classic methods in cooperative game theory for cost/profit sharing problems. Both methods have recently been proposed as a principled solution for data valuation tasks, i.e., quantifying the contribution of individual datum in machine learning. However, both SV and LC suffer computational challenges due to the need for retraining models on combinatorially many data subsets. In this work, we propose to boost the efficiency in computing Shapley value or Least core by learning to estimate the performance of a learning algorithm on unseen data combinations. Theoretically, we derive bounds relating the error in the predicted learning performance to the approximation error in SV and LC. Empirically, we show that the proposed method can significantly improve the accuracy of SV and LC estimation.  ( 2 min )
    Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients. (arXiv:2204.03304v1 [cs.LG])
    Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.  ( 2 min )
    GraFN: Semi-Supervised Node Classification on Graph with Few Labels via Non-Parametric Distribution Assignment. (arXiv:2204.01303v2 [cs.LG] UPDATED)
    Despite the success of Graph Neural Networks (GNNs) on various applications, GNNs encounter significant performance degradation when the amount of supervision signals, i.e., number of labeled nodes, is limited, which is expected as GNNs are trained solely based on the supervision obtained from the labeled nodes. On the other hand,recent self-supervised learning paradigm aims to train GNNs by solving pretext tasks that do not require any labeled nodes, and it has shown to even outperform GNNs trained with few labeled nodes. However, a major drawback of self-supervised methods is that they fall short of learning class discriminative node representations since no labeled information is utilized during training. To this end, we propose a novel semi-supervised method for graphs, GraFN, that leverages few labeled nodes to ensure nodes that belong to the same class to be grouped together, thereby achieving the best of both worlds of semi-supervised and self-supervised methods. Specifically, GraFN randomly samples support nodes from labeled nodes and anchor nodes from the entire graph. Then, it minimizes the difference between two predicted class distributions that are non-parametrically assigned by anchor-supports similarity from two differently augmented graphs. We experimentally show that GraFN surpasses both the semi-supervised and self-supervised methods in terms of node classification on real-world graphs. The source code for GraFN is available at https://github.com/Junseok0207/GraFN.  ( 2 min )
    DynLight: Realize dynamic phase duration with multi-level traffic signal control. (arXiv:2204.03471v1 [cs.AI])
    Adopting reinforcement learning (RL) for traffic signal control is increasingly popular. Most RL methods use fixed action interval (denoted as tduration) and actuate or maintain a phase every tduration, which makes the phase duration less dynamic and flexible. In addition, the actuated phase can be arbitrary, affecting the real-world deployment, which requires a fixed cyclical phase structure. To address these challenges, we propose a multi-level traffic signal control framework, DynLight, which uses an optimization method Max-QueueLength (M-QL) to determine the phase and uses a deep Q-network to determine the corresponding duration. Based on DynLight, we further propose DynLight-C that adopts a well trained deep Q-network of DynLight and replace M-QL by a fixed cyclical control policy that actuate a set of phases in fixed order to realize cyclical phase structure. Comprehensive experiments on multiple real-world datasets demonstrate that DynLight achives a new state-of-the-art. Furthermore, the deep Q-network of DynLight can learn well on determining the phase duration and DynLight-C demonstrates high performance for deployment.  ( 2 min )
    Multiplayer Performative Prediction: Learning in Decision-Dependent Games. (arXiv:2201.03398v2 [cs.GT] UPDATED)
    Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but can be found efficiently only when the game is monotone. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )
    Automated question generation and question answering from Turkish texts. (arXiv:2111.06476v4 [cs.LG] UPDATED)
    While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using Turkish QA datasets. To the best of our knowledge, this is the first academic work that performs automated text-to-text question generation from Turkish texts. Experimental evaluations show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance on TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and the pre-trained models are available at https://github.com/obss/turkish-question-generation.
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    On Monte Carlo Tree Search for Weighted Vertex Coloring. (arXiv:2202.01665v2 [cs.LG] UPDATED)
    This work presents the first study of using the popular Monte Carlo Tree Search (MCTS) method combined with dedicated heuristics for solving the Weighted Vertex Coloring Problem. Starting with the basic MCTS algorithm, we gradually introduce a number of algorithmic variants where MCTS is extended by various simulation strategies including greedy and local search heuristics. We conduct experiments on well-known benchmark instances to assess the value of each studied combination. We also provide empirical evidence to shed light on the advantages and limits of each strategy.
    A survey on recently proposed activation functions for Deep Learning. (arXiv:2204.02921v2 [cs.LG] UPDATED)
    Artificial neural networks (ANN), typically referred to as neural networks, are a class of Machine Learning algorithms and have achieved widespread success, having been inspired by the biological structure of the human brain. Neural networks are inherently powerful due to their ability to learn complex function approximations from data. This generalization ability has been able to impact multidisciplinary areas involving image recognition, speech recognition, natural language processing, and others. Activation functions are a crucial sub-component of neural networks. They define the output of a node in the network given a set of inputs. This survey discusses the main concepts of activation functions in neural networks, including; a brief introduction to deep neural networks, a summary of what are activation functions and how they are used in neural networks, their most common properties, the different types of activation functions, some of the challenges, limitations, and alternative solutions faced by activation functions, concluding with the final remarks.
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.
    On the Effectiveness of Pretrained Models for API Learning. (arXiv:2204.03498v1 [cs.SE])
    Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by around 11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
    Graph Neural Network-based Android Malware Classification with Jumping Knowledge. (arXiv:2201.07537v5 [cs.CR] UPDATED)
    This paper presents a new Android malware detection method based on Graph Neural Networks (GNNs) with Jumping-Knowledge (JK). Android function call graphs (FCGs) consist of a set of program functions and their inter-procedural calls. Thus, this paper proposes a GNN-based method for Android malware detection by capturing meaningful intra-procedural call path patterns. In addition, a Jumping-Knowledge technique is applied to minimize the effect of the over-smoothing problem, which is common in GNNs. The proposed method has been extensively evaluated using two benchmark datasets. The results demonstrate the superiority of our approach compared to state-of-the-art approaches in terms of key classification metrics, which demonstrates the potential of GNNs in Android malware detection and classification.
    Federated Learning with Erroneous Communication Links. (arXiv:2201.12991v2 [cs.LG] UPDATED)
    In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $e$ and $1-e$, respectively. We provide mathematical proof for the convergence of the FL algorithm in the presence of communication errors, where the CN uses past local updates when the fresh updates are not received from some devices. We show via simulations that by using the past local updates, the FL algorithm can converge in the presence of communication errors. We also show that when the dataset is uniformly distributed among devices, the FL algorithm that only uses fresh updates and discards missing updates might converge faster than the FL algorithm that uses past local updates.
    An Exploration of Active Learning for Affective Digital Phenotyping. (arXiv:2204.01915v2 [cs.LG] UPDATED)
    Some of the most severe bottlenecks preventing widespread development of machine learning models for human behavior include a dearth of labeled training data and difficulty of acquiring high quality labels. Active learning is a paradigm for using algorithms to computationally select a useful subset of data points to label using metrics for model uncertainty and data similarity. We explore active learning for naturalistic computer vision emotion data, a particularly heterogeneous and complex data space due to inherently subjective labels. Using frames collected from gameplay acquired from a therapeutic smartphone game for children with autism, we run a simulation of active learning using gameplay prompts as metadata to aid in the active learning process. We find that active learning using information generated during gameplay slightly outperforms random selection of the same number of labeled frames. We next investigate a method to conduct active learning with subjective data, such as in affective computing, and where multiple crowdsourced labels can be acquired for each image. Using the Child Affective Facial Expression (CAFE) dataset, we simulate an active learning process for crowdsourcing many labels and find that prioritizing frames using the entropy of the crowdsourced label distribution results in lower categorical cross-entropy loss compared to random frame selection. Collectively, these results demonstrate pilot evaluations of two novel active learning approaches for subjective affective data collected in noisy settings.
    ECMG: Exemplar-based Commit Message Generation. (arXiv:2203.02700v2 [cs.SE] UPDATED)
    Commit messages concisely describe the content of code diffs (i.e., code changes) and the intent behind them. Recently, many approaches have been proposed to generate commit messages automatically. The information retrieval-based methods reuse the commit messages of similar code diffs, while the neural-based methods learn the semantic connection between code diffs and commit messages. However, the reused commit messages might not accurately describe the content/intent of code diffs and neural-based methods tend to generate high-frequent and repetitive tokens in the corpus. In this paper, we combine the advantages of the two technical routes and propose a novel exemplar-based neural commit message generation model, which treats the similar commit message as an exemplar and leverages it to guide the neural network model to generate an accurate commit message. We perform extensive experiments and the results confirm the effectiveness of our model.
    Causality, Causal Discovery, and Causal Inference in Structural Engineering. (arXiv:2204.01543v2 [cs.LG] UPDATED)
    Much of our experiments are designed to uncover the cause(s) and effect(s) behind a data generating mechanism (i.e., phenomenon) we happen to be interested in. Uncovering such relationships allows us to identify the true working of a phenomenon and, most importantly, articulate a model that may enable us to further explore the phenomenon on hand and/or allow us to predict it accurately. Fundamentally, such models are likely to be derived via a causal approach (as opposed to an observational or empirical mean). In this approach, causal discovery is required to create a causal model, which can then be applied to infer the influence of interventions, and answer any hypothetical questions (i.e., in the form of What ifs? Etc.) that we might have. This paper builds a case for causal discovery and causal inference and contrasts that against traditional machine learning approaches; all from a civil and structural engineering perspective. More specifically, this paper outlines the key principles of causality and the most commonly used algorithms and packages for causal discovery and causal inference. Finally, this paper also presents a series of examples and case studies of how causal concepts can be adopted for our domain.
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.
    Data-Centric Green AI: An Exploratory Empirical Study. (arXiv:2204.02766v2 [cs.LG] UPDATED)
    With the growing availability of large-scale datasets, and the popularization of affordable storage and computational capabilities, the energy consumed by AI is becoming a growing concern. To address this issue, in recent years, studies have focused on demonstrating how AI energy efficiency can be improved by tuning the model training strategy. Nevertheless, how modifications applied to datasets can impact the energy consumption of AI is still an open question. To fill this gap, in this exploratory study, we evaluate if data-centric approaches can be utilized to improve AI energy efficiency. To achieve our goal, we conduct an empirical experiment, executed by considering 6 different AI algorithms, a dataset comprising 5,574 data points, and two dataset modifications (number of data points and number of features). Our results show evidence that, by exclusively conducting modifications on datasets, energy consumption can be drastically reduced (up to 92.16%), often at the cost of a negligible or even absent accuracy decline. As additional introductory results, we demonstrate how, by exclusively changing the algorithm used, energy savings up to two orders of magnitude can be achieved. In conclusion, this exploratory investigation empirically demonstrates the importance of applying data-centric techniques to improve AI energy efficiency. Our results call for a research agenda that focuses on data-centric techniques, to further enable and democratize Green AI.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.
    Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings. (arXiv:2111.00185v2 [cs.LG] UPDATED)
    Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with $L_2$ integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.
    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos. (arXiv:2111.14465v2 [cs.CV] UPDATED)
    We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using differentiable rendering, we are able to estimate all parameters by minimizing the pixel-wise reprojection error to the input video via backpropagating through a rendering pipeline that accounts for motion blur by averaging the graphics output over short time intervals. For that purpose, we also estimate the camera exposure gap time within the same optimization. To account for abrupt motion changes like bounces, we model the motion trajectory as a piece-wise polynomial, and we are able to estimate the specific time of the bounce at sub-frame accuracy. Experiments on established benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
    Federated Learning of Generative Image Priors for MRI Reconstruction. (arXiv:2202.04175v2 [eess.IV] UPDATED)
    Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data. Federated learning (FL) has recently been introduced to address privacy concerns by enabling distributed training without transfer of imaging data. Existing FL methods for MRI reconstruction employ conditional models to map from undersampled to fully-sampled acquisitions via explicit knowledge of the imaging operator. Since conditional models generalize poorly across different acceleration rates or sampling densities, imaging operators must be fixed between training and testing, and they are typically matched across sites. To improve generalization and flexibility in multi-institutional collaborations, here we introduce a novel method for MRI reconstruction based on Federated learning of Generative IMage Priors (FedGIMP). FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and subject-specific injection of the imaging operator. The global MRI prior is learned via an unconditional adversarial model that synthesizes high-quality MR images based on latent variables. Specificity in the prior is preserved via a mapper subnetwork that produces site-specific latents. During inference, the prior is combined with subject-specific imaging operators to enable reconstruction, and further adapted to individual test samples by minimizing data-consistency loss. Comprehensive experiments on multi-institutional datasets clearly demonstrate enhanced generalization performance of FedGIMP against site-specific and federated methods based on conditional models, as well as traditional reconstruction methods.
    Machine Learning based Medical Image Deepfake Detection: A Comparative Study. (arXiv:2109.12800v2 [cs.CV] UPDATED)
    Deep generative networks in recent years have reinforced the need for caution while consuming various modalities of digital information. One avenue of deepfake creation is aligned with injection and removal of tumors from medical scans. Failure to detect medical deepfakes can lead to large setbacks on hospital resources or even loss of life. This paper attempts to address the detection of such attacks with a structured case study. Specifically, we evaluate eight different machine learning algorithms, which including three conventional machine learning methods, support vector machine, random forest, decision tree, and five deep learning models, DenseNet121, DenseNet201, ResNet50, ResNet101, VGG19, on distinguishing between tampered and untampered images.For deep learning models, the five models are used for feature extraction, then fine-tune for each pre-trained model is performed. The findings of this work show near perfect accuracy in detecting instances of tumor injections and removals.
    Reinforcement Learning with Almost Sure Constraints. (arXiv:2112.05198v2 [cs.LG] UPDATED)
    In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the agent must navigate. Simulations illustrate the different nature of probability one constraints against the typically used constraints in expectation.
    Quantum Distributed Deep Learning Architectures: Models, Discussions, and Applications. (arXiv:2202.11200v3 [quant-ph] UPDATED)
    Although deep learning (DL) has already become a state-of-the-art technology for various data processing tasks, data security and computational overload problems often arise due to their high data and computational power dependency. To solve this problem, quantum deep learning (QDL) and distributed deep learning (DDL) has emerged to complement existing DL methods. Furthermore, a quantum distributed deep learning (QDDL) technique that combines and maximizes these advantages is getting attention. This paper compares several model structures for QDDL and discusses their possibilities and limitations to leverage QDDL for some representative application scenarios.
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.
    SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging. (arXiv:2107.02375v4 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base networks and generalized to various types of medical imaging tasks.
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.
    Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. (arXiv:2111.14826v2 [cs.CV] UPDATED)
    The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5~1.7 on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization.
    Margin Calibration for Long-Tailed Visual Recognition. (arXiv:2112.07225v2 [cs.CV] UPDATED)
    The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.
    Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. (arXiv:2202.06934v3 [cs.CV] UPDATED)
    Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .
    Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation. (arXiv:2203.05774v2 [eess.SY] UPDATED)
    In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.
    Membership Inference Attacks Against Self-supervised Speech Models. (arXiv:2111.05113v2 [cs.CR] UPDATED)
    Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.
    Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning. (arXiv:2112.07535v2 [cs.LG] UPDATED)
    The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50\% fewer state measurements, and recurrent neural networks can produce a greater than 50\% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.
    Control Theoretic Analysis of Temporal Difference Learning. (arXiv:2112.14417v4 [cs.AI] UPDATED)
    The goal of this paper is to investigate a control theoretic analysis of linear stochastic iterative algorithm and temporal difference (TD) learning. TD-learning is a linear stochastic iterative algorithm to estimate the value function of a given policy for a Markov decision process, which is one of the most popular and fundamental reinforcement learning algorithms. While there has been a series of successful works in theoretical analysis of TD-learning, it was not until recently that researchers found some guarantees on its statistical efficiency. In this paper, we propose a control theoretic finite-time analysis TD-learning, which exploits standard notions in linear system control communities. Therefore, the proposed work provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.
    Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning. (arXiv:2204.03597v1 [cs.LG])
    The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.
    Discriminability-enforcing loss to improve representation learning. (arXiv:2202.07073v2 [cs.CV] UPDATED)
    During the training process, deep neural networks implicitly learn to represent the input data samples through a hierarchy of features, where the size of the hierarchy is determined by the number of layers. In this paper, we focus on enforcing the discriminative power of the high-level representations, that are typically learned by the deeper layers (closer to the output). To this end, we introduce a new loss term inspired by the Gini impurity, which is aimed at minimizing the entropy (increasing the discriminative power) of individual high-level features with respect to the class labels. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. As such, we introduce another loss term to minimize the Kullback-Leibler divergence between the two distributions. We conduct experiments on two image classification data sets (CIFAR-100 and Caltech 101), considering multiple neural architectures ranging from convolutional networks (ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone, without increasing the inference time at all.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    Deep learning method for identifying mass composition of ultra-high-energy cosmic rays. (arXiv:2112.02072v2 [astro-ph.IM] UPDATED)
    We introduce a novel method for identifying the mass composition of ultra-high-energy cosmic rays using deep learning. The key idea of the method is to use a chain of two neural networks. The first network predicts the type of a primary particle for individual events, while the second infers the mass composition of an ensemble of events. We apply this method to the Monte-Carlo data for the Telescope Array Surface Detectors readings, on which it yields an unprecedented low error of 7% for 4-component approximation. We also discuss the problems of applying the developed method to the experimental data, and the way they can be resolved.
    Learning and Transferring Value Function for Robot Exploration in Subterranean Environments. (arXiv:2204.03140v1 [cs.RO])
    In traditional robot exploration methods, the robot usually does not have prior biases about the environment it is exploring. Thus the robot assigns equal importance to the goals which leads to insufficient exploration efficiency. Alternative, often a hand-tuned policy is used to tweak the value of goals. In this paper, we present a method to learn how "good" some states are, measured by the state value function, to provide a hint for the robot to make exploration decisions. We propose to learn state value functions from previous offline collected datasets and then transfer and improve the value function during testing in a new environment. Moreover, the environments usually have very few and even no extrinsic reward or feedback for the robot. Therefore in this work, we also tackle the problem of sparse extrinsic rewards from the environments. We design several intrinsic rewards to encourage the robot to obtain more information during exploration. These reward functions then become the building blocks of the state value functions. We test our method on challenging subterranean and urban environments. To the best of our knowledge, this work for the first time demonstrates value function prediction with previous collected datasets to help exploration in challenging subterranean environments.
    Security Aspects of Quantum Machine Learning: Opportunities, Threats and Defenses. (arXiv:2204.03625v1 [cs.CR])
    In the last few years, quantum computing has experienced a growth spurt. One exciting avenue of quantum computing is quantum machine learning (QML) which can exploit the high dimensional Hilbert space to learn richer representations from limited data and thus can efficiently solve complex learning tasks. Despite the increased interest in QML, there have not been many studies that discuss the security aspects of QML. In this work, we explored the possible future applications of QML in the hardware security domain. We also expose the security vulnerabilities of QML and emerging attack models, and corresponding countermeasures.
    Heterogeneous Target Speech Separation. (arXiv:2204.03594v1 [cs.SD])
    We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.
    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.
    DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search. (arXiv:2011.02166v2 [cs.CV] UPDATED)
    The convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead against efficient deployment. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure, such that the pruned network can be easily deployed in practice. However, existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space. In this paper, we introduce Differentiable Annealing Indicator Search (DAIS) that leverages the strength of neural architecture search in the channel pruning and automatically searches for the effective pruned model with given constraints on computation overhead. Specifically, DAIS relaxes the binarized channel indicators to be continuous and then jointly learns both indicators and model parameters via bi-level optimization. To bridge the non-negligible discrepancy between the continuous model and the target binarized model, DAIS proposes an annealing-based procedure to steer the indicator convergence towards binarized states. Moreover, DAIS designs various regularizations based on a priori structural knowledge to control the pruning sparsity and to improve model performance. Experimental results show that DAIS outperforms state-of-the-art pruning methods on CIFAR-10, CIFAR-100, and ImageNet.
    Variational Autoencoder based Metamodeling for Multi-Objective Topology Optimization of Electrical Machines. (arXiv:2201.08877v2 [cs.LG] UPDATED)
    Conventional magneto-static finite element analysis of electrical machine design is time-consuming and computationally expensive. Since each machine topology has a distinct set of parameters, design optimization is commonly performed independently. This paper presents a novel method for predicting Key Performance Indicators (KPIs) of differently parameterized electrical machine topologies at the same time by mapping a high dimensional integrated design parameters in a lower dimensional latent space using a variational autoencoder. After training, via a latent space, the decoder and multi-layer neural network will function as meta-models for sampling new designs and predicting associated KPIs, respectively. This enables parameter-based concurrent multi-topology optimization.
    End-To-End Optimization of Online Neural Network-supported Two-Stage Dereverberation for Hearing Devices. (arXiv:2204.02978v1 [eess.AS])
    A two-stage online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filtering approach with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs). This contribution extends our prior work, which shows that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation, as compared to placing the criterion at the output of the DNN to optimize the PSD estimation. In the present work, we show that the dereverberation performance of the proposed first stage particularly improves the early-to-mid reverberation ratio if trained end-to-end. We thus argue that it can be combined with a post-filtering stage which benefits from the early-to-mid ratio improvement and is consequently able to efficiently suppress the residual late reverberation. This proposed two stage procedure is shown to be both very effective in terms of dereverberation performance and computational demands. Furthermore, the proposed system can be adapted to the needs of different types of hearing-device users by controlling the amount of reduction of early reflections. The proposed system outperforms the previously proposed end-to-end DNN-supported linear filtering algorithm, as well as other traditional approaches, based on an evaluation using the noise-free version of the WHAMR! dataset.
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.
    Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems. (arXiv:2203.06416v2 [cs.AI] UPDATED)
    When dealing with a series of imminent issues, humans can naturally concentrate on a subset of these concerning issues by prioritizing them according to their contributions to motivational indices, e.g., the probability of winning a game. This idea of concentration offers insights into reinforcement learning of sophisticated Large-scale Multi-Agent Systems (LMAS) participated by hundreds of agents. In such an LMAS, each agent receives a long series of entity observations at each step, which can overwhelm existing aggregation networks such as graph attention networks and cause inefficiency. In this paper, we propose a concentration network called ConcNet. First, ConcNet scores the observed entities considering several motivational indices, e.g., expected survival time and state value of the agents, and then ranks, prunes, and aggregates the encodings of observed entities to extract features. Second, distinct from the well-known attention mechanism, ConcNet has a unique motivational subnetwork to explicitly consider the motivational indices when scoring the observed entities. Furthermore, we present a concentration policy gradient architecture that can learn effective policies in LMAS from scratch. Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.  ( 2 min )
    On the Limitations of Multimodal VAEs. (arXiv:2110.04121v2 [cs.LG] UPDATED)
    Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
    Towards Improving Selective Prediction Ability of NLP Systems. (arXiv:2008.09371v3 [cs.CL] UPDATED)
    It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over 'MaxProb' -- a selective prediction baseline -- on NLI and DD tasks respectively.
    Multi-Sample $\zeta$-mixup: Richer, More Realistic Synthetic Samples from a $p$-Series Interpolant. (arXiv:2204.03323v1 [cs.LG])
    Modern deep learning training procedures rely on model regularization techniques such as data augmentation methods, which generate training samples that increase the diversity of data and richness of label information. A popular recent method, mixup, uses convex combinations of pairs of original samples to generate new samples. However, as we show in our experiments, mixup can produce undesirable synthetic samples, where the data is sampled off the manifold and can contain incorrect labels. We propose $\zeta$-mixup, a generalization of mixup with provably and demonstrably desirable properties that allows convex combinations of $N \geq 2$ samples, leading to more realistic and diverse outputs that incorporate information from $N$ original samples by using a $p$-series interpolant. We show that, compared to mixup, $\zeta$-mixup better preserves the intrinsic dimensionality of the original datasets, which is a desirable property for training generalizable models. Furthermore, we show that our implementation of $\zeta$-mixup is faster than mixup, and extensive evaluation on controlled synthetic and 24 real-world natural and medical image classification datasets shows that $\zeta$-mixup outperforms mixup and traditional data augmentation techniques.
    Inference over radiative transfer models using variational and expectation maximization methods. (arXiv:2204.03346v1 [cs.LG])
    Earth observation from satellites offers the possibility to monitor our planet with unprecedented accuracy. Radiative transfer models (RTMs) encode the energy transfer through the atmosphere, and are used to model and understand the Earth system, as well as to estimate the parameters that describe the status of the Earth from satellite observations by inverse modeling. However, performing inference over such simulators is a challenging problem. RTMs are nonlinear, non-differentiable and computationally costly codes, which adds a high level of difficulty in inference. In this paper, we introduce two computational techniques to infer not only point estimates of biophysical parameters but also their joint distribution. One of them is based on a variational autoencoder approach and the second one is based on a Monte Carlo Expectation Maximization (MCEM) scheme. We compare and discuss benefits and drawbacks of each approach. We also provide numerical comparisons in synthetic simulations and the real PROSAIL model, a popular RTM that combines land vegetation leaf and canopy modeling. We analyze the performance of the two approaches for modeling and inferring the distribution of three key biophysical parameters for quantifying the terrestrial biosphere.
    Data Justice Stories: A Repository of Case Studies. (arXiv:2204.03100v1 [cs.CY])
    The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of confronting the root causes of injustice.However, despite the seeming novelty of such a data justice pedigree, this joining up of the critique of the power imbalances that have shaped the digital and "big data" revolutions with a commitment to social equity and constructive societal transformation has a deeper historical, and more geographically diverse, provenance. As the stories of the data justice initiatives, activism, and advocacy contained in this volume well evidence, practices of data justice across the globe have, in fact, largely preceded the elaboration and crystallisation of the idea of data justice in contemporary academic discourse. In telling these data justice stories, we hope to provide the reader with two interdependent tools of data justice thinking: First, we aim to provide the reader with the critical leverage needed to discern those distortions and malformations of data justice that manifest in subtle and explicit forms of power, domination, and coercion. Second, we aim to provide the reader with access to the historically effective forms of normativity and ethical insight that have been marshalled by data justice activists and advocates as tools of societal transformation-so that these forms of normativity and insight can be drawn on, in turn, as constructive resources to spur future transformative data justice practices.
    Correcting Misproducted Speech using Spectrogram Inpainting. (arXiv:2204.03379v1 [eess.AS])
    Learning a new language involves constantly comparing speech productions with reference productions from the environment. Early in speech acquisition, children make articulatory adjustments to match their caregivers' speech. Grownup learners of a language tweak their speech to match the tutor reference. This paper proposes a method to synthetically generate correct pronunciation feedback given incorrect production. Furthermore, our aim is to generate the corrected production while maintaining the speaker's original voice. The system prompts the user to pronounce a phrase. The speech is recorded, and the samples associated with the inaccurate phoneme are masked with zeros. This waveform serves as an input to a speech generator, implemented as a deep learning inpainting system with a U-net architecture, and trained to output a reconstructed speech. The training set is composed of unimpaired proper speech examples, and the generator is trained to reconstruct the original proper speech. We evaluated the performance of our system on phoneme replacement of minimal pair words of English as well as on children with pronunciation disorders. Results suggest that human listeners slightly prefer our generated speech over a smoothed replacement of the inaccurate phoneme with a production of a different speaker.
    Accelerating Attention through Gradient-Based Learned Runtime Pruning. (arXiv:2204.03227v1 [cs.CL])
    Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural Language Processing models. This attention mechanism calculates a correlation score for each word with respect to the other words in a sentence. Commonly, only a small subset of words highly correlates with the word under attention, which is only determined at runtime. As such, a significant amount of computation is inconsequential due to low attention scores and can potentially be pruned. The main challenge is finding the threshold for the scores below which subsequent computation will be inconsequential. Although such a threshold is discrete, this paper formulates its search through a soft differentiable regularizer integrated into the loss function of the training. This formulation piggy backs on the back-propagation training to analytically co-optimize the threshold and the weights simultaneously, striking a formally optimal balance between accuracy and computation pruning. To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism. We evaluate our design across 43 back-end tasks for MemN2N, BERT, ALBERT, GPT-2, and Vision transformer models. Post-layout results show that, on average, LeOPArd yields 1.9x and 3.9x speedup and energy reduction, respectively, while keeping the average accuracy virtually intact (<0.2% degradation)
    Pin the Memory: Learning to Generalize Semantic Segmentation. (arXiv:2204.03609v1 [cs.CV])
    The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework. Especially, our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains. Upon the meta-learning concept, we repeatedly train memory-guided networks and simulate virtual test to 1) learn how to memorize a domain-agnostic and distinct information of classes and 2) offer an externally settled memory as a class-guidance to reduce the ambiguity of representation in the test data of arbitrary unseen domain. To this end, we also propose memory divergence and feature cohesion losses, which encourage to learn memory reading and update processes for category-aware domain generalization. Extensive experiments for semantic segmentation demonstrate the superior generalization capability of our method over state-of-the-art works on various benchmarks.
    Class-Incremental Learning with Strong Pre-trained Models. (arXiv:2204.03634v1 [cs.CV])
    Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation -- cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings.
    GNNLens: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks. (arXiv:2011.11048v6 [cs.HC] UPDATED)
    Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis tool, GNNLens, to assist model developers and users in understanding and analyzing GNNs. Specifically, Parallel Sets View and Projection View enable users to quickly identify and validate error patterns in the set of wrong predictions; Graph View and Feature Matrix View offer a detailed analysis of individual nodes to assist users in forming hypotheses about the error patterns. Since GNNs jointly model the graph structure and the node features, we reveal the relative influences of the two types of information by comparing the predictions of three models: GNN, Multi-Layer Perceptron (MLP), and GNN Without Using Features (GNNWUF). Two case studies and interviews with domain experts demonstrate the effectiveness of GNNLens in facilitating the understanding of GNN models and their errors.
    Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection. (arXiv:2010.05454v3 [cs.LG] UPDATED)
    Feature selection is an important data preprocessing in data mining and machine learning which can be used to reduce the feature dimension without deteriorating model's performance. Since obtaining annotated data is laborious or even infeasible in many cases, unsupervised feature selection is more practical in reality. Though lots of methods for unsupervised feature selection have been proposed, these methods select features independently, thus it is no guarantee that the group of selected features is optimal. What's more, the number of selected features must be tuned carefully to obtain a satisfactory result. To tackle these problems, we propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method in this paper, in which a $l_{2,0}$-norm regularization term with respect to transformation matrix is imposed in the manifold learning for feature selection, and a graph regularization term is incorporated into the learning model to learn the local geometric structure of data adaptively. An efficient and simple iterative algorithm is designed to solve the proposed optimization problem with the analysis of computational complexity. After optimized, a subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method compared with several state-of-the-art approaches.
    Fast Design Space Exploration of Nonlinear Systems: Part I. (arXiv:2104.01747v7 [cs.LG] UPDATED)
    System design tools are often only available as input-output blackboxes: for a given design as input they compute an output representing system behavior. Blackboxes are intended to be run in the forward direction. This paper presents a new method of solving the inverse design problem namely, given requirements or constraints on output, find an input that also optimizes an objective function. This problem is challenging for several reasons. First, blackboxes are not designed to be run in reverse. Second, inputs and outputs can be discrete and continuous. Third, finding designs concurrently satisfying a set of requirements is hard because designs satisfying individual requirements may conflict with each other. Fourth, blackbox evaluations can be expensive. Finally, blackboxes can sometimes fail to produce an output. This paper presents CNMA, a new method of solving the inverse problem that overcomes these challenges. CNMA tries to sample only the part of the design space relevant to solving the problem, leveraging the power of neural networks, Mixed Integer Linear Programs, and a new learning-from-failure feedback loop. The paper also presents a parallel version of CNMA that improves the efficiency and quality of solutions over the sequential version, and tries to steer it away from local optima. CNMA's performance is evaluated against conventional optimization methods for seven nonlinear design problems of 8 (two problems), 10, 15, 36 and 60 real-valued dimensions and one with 186 binary dimensions. Conventional methods evaluated are off-the-shelf implementations of Bayesian Optimization with Gaussian Processes, Nelder Mead and Random Search. The first two do not solve problems that are high-dimensional, have discrete and continuous variables or whose blackboxes can fail to return values. CNMA solves all problems, and surpasses the performance of conventional methods by up to 87%.
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.
    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids. (arXiv:2204.03305v1 [eess.AS])
    Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.
    Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews. (arXiv:2104.05861v3 [cs.SE] UPDATED)
    Context: Mobile app reviews written by users on app stores or social media are significant resources for app developers.Analyzing app reviews have proved to be useful for many areas of software engineering (e.g., requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. identifying bugs versus usability issues or sentiment), new datasets should be labeled, which prevents the extensibility of the developed models for new desired classes/tasks in practice. Recent pre-trained neural language models (PTM) are trained on large corpora in an unsupervised manner and have found success in solving similar Natural Language Processing problems. However, the applicability of PTMs is not explored for app review classification Objective: We investigate the benefits of PTMs for app review classification compared to the existing models, as well as the transferability of PTMs in multiple settings. Method: We empirically study the accuracy and time efficiency of PTMs compared to prior approaches using six datasets from literature. In addition, we investigate the performance of the PTMs trained on app reviews (i.e. domain-specific PTMs) . We set up different studies to evaluate PTMs in multiple settings: binary vs. multi-class classification, zero-shot classification (when new labels are introduced to the model), multi-task setting, and classification of reviews from different resources. The datasets are manually labeled app review datasets from Google Play Store, Apple App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used and we will report the time required for training and prediction with the models.
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
    Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention. (arXiv:2204.03479v1 [cs.CL])
    Multi-head self-attention forms the core of Transformer networks. However, their quadratically growing complexity with respect to the input sequence length impedes their deployment on resource-constrained edge devices. We address this challenge by proposing a dynamic pruning method, which exploits the temporal stability of data across tokens to reduce inference cost. The threshold-based method only retains significant differences between the subsequent tokens, effectively reducing the number of multiply-accumulates, as well as the internal tensor data sizes. The approach is evaluated on the Google Speech Commands Dataset for keyword spotting, and the performance is compared against the baseline Keyword Transformer. Our experiments show that we can reduce ~80% of operations while maintaining the original 98.4% accuracy. Moreover, a reduction of ~87-94% operations can be achieved when only degrading the accuracy by 1-4%, speeding up the multi-head self-attention inference by a factor of ~7.5-16.
    Contrastive Learning Inverts the Data Generating Process. (arXiv:2102.08850v4 [cs.LG] UPDATED)
    Contrastive learning has recently seen tremendous success in self-supervised learning. So far, however, it is largely unclear why the learned representations generalize so effectively to a large variety of downstream tasks. We here prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data. While the proofs make certain statistical assumptions about the generative model, we observe empirically that our findings hold even if these assumptions are severely violated. Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis, thereby furthering our understanding of the learned representations as well as providing a theoretical foundation to derive more effective contrastive losses.
    Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes. (arXiv:2102.00135v6 [cs.LG] UPDATED)
    We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall highly nonconvex problems we show that the PMD methods exhibit fast linear rate of convergence to the global optimality. We develop stochastic counterparts of these methods, and establish an ${\cal O}(1/\epsilon)$ (resp., ${\cal O}(1/\epsilon^2)$) sampling complexity for solving these RL problems with strongly (resp., general) convex regularizers using different sampling schemes, where $\epsilon$ denote the target accuracy. We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log (1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon ) (L/\epsilon)^{1/2}\}$)for problems with strongly (resp., general) convex regularizers. Here $\gamma$ denotes the discounting factor. To the best of our knowledge, these complexity bounds, along with our algorithmic developments, appear to be new in both optimization and RL literature. The introduction of these convex regularizers also greatly expands the flexibility and applicability of RL models.
    Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers. (arXiv:2204.03503v1 [cs.CL])
    Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
    An Overview on Artificial Intelligence Techniques for Diagnosis of Schizophrenia Based on Magnetic Resonance Imaging Modalities: Methods, Challenges, and Future Works. (arXiv:2103.03081v2 [cs.LG] UPDATED)
    Schizophrenia (SZ) is a mental disorder that typically emerges in late adolescence or early adulthood. It reduces the life expectancy of patients by 15 years. Abnormal behavior, perception of emotions, social relationships, and reality perception are among its most significant symptoms. Past studies have revealed the temporal and anterior lobes of hippocampus regions of brain get affected by SZ. Also, increased volume of cerebrospinal fluid (CSF) and decreased volume of white and gray matter can be observed due to this disease. The magnetic resonance imaging (MRI) is the popular neuroimaging technique used to explore structural/functional brain abnormalities in SZ disorder owing to its high spatial resolution. Various artificial intelligence (AI) techniques have been employed with advanced image/signal processing methods to obtain accurate diagnosis of SZ. This paper presents a comprehensive overview of studies conducted on automated diagnosis of SZ using MRI modalities. Main findings, various challenges, and future works in developing the automated SZ detection are described in this paper.
    Unified Contrastive Learning in Image-Text-Label Space. (arXiv:2204.03610v1 [cs.CV])
    Visual recognition is recently learned via either supervised learning on human-annotated image-label data or language-image contrastive learning with webly-crawled image-text pairs. While supervised learning may result in a more discriminative representation, language-image pretraining shows unprecedented zero-shot recognition capability, largely due to the different properties of data sources and learning objectives. In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. In this space, we propose a new learning paradigm, called Unified Contrastive Learning (UniCL) with a single learning objective to seamlessly prompt the synergy of two data types. Extensive experiments show that our UniCL is an effective way of learning semantically rich yet discriminative representations, universally for image recognition in zero-shot, linear-probe, fully finetuning and transfer learning scenarios. Particularly, it attains gains up to 9.2% and 14.5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively. In linear probe setting, it also boosts the performance over the two methods by 7.3% and 3.4%, respectively. Our study also indicates that UniCL stand-alone is a good learner on pure image-label data, rivaling the supervised learning methods across three image classification datasets and two types of vision backbones, ResNet and Swin Transformer. Code is available at https://github.com/microsoft/UniCL.
    A Pathology-Based Machine Learning Method to Assist in Epithelial Dysplasia Diagnosis. (arXiv:2204.03572v1 [eess.IV])
    The Epithelial Dysplasia (ED) is a tissue alteration commonly present in lesions preceding oral cancer, being its presence one of the most important factors in the progression toward carcinoma. This study proposes a method to design a low computational cost classification system to support the detection of dysplastic epithelia, contributing to reduce the variability of pathologist assessments. We employ a multilayer artificial neural network (MLP-ANN) and defining the regions of the epithelium to be assessed based on the knowledge of the pathologist. The performance of the proposed solution was statistically evaluated. The implemented MLP-ANN presented an average accuracy of 87%, with a variability much inferior to that obtained from three trained evaluators. Moreover, the proposed solution led to results which are very close to those obtained using a convolutional neural network (CNN) implemented by transfer learning, with 100 times less computational complexity. In conclusion, our results show that a simple neural network structure can lead to a performance equivalent to that of much more complex structures, which are routinely used in the literature.
    Equivariance Discovery by Learned Parameter-Sharing. (arXiv:2204.03640v1 [cs.LG])
    Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. We propose to use the partition distance to empirically quantify the accuracy of the recovered equivariance. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme. Empirically, we show that the approach recovers known equivariances, such as permutations and shifts, on sum of numbers and spatially-invariant data.
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.
    Interval Bound Propagation--aided Few-shot Learning. (arXiv:2204.03511v1 [cs.LG])
    Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks, from a given task distribution, to generalize to unseen tasks, from the same distribution, with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. One way to encourage this is to preserve local neighborhoods in the feature space learned by the few-shot learner. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We further introduce a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds, to aid in cases with a scarcity of tasks. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains in comparison to a sizable number of recent competitors.
    An optimized hybrid solution for IoT based lifestyle disease classification using stress data. (arXiv:2204.03573v1 [eess.SP])
    Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent research advances have provided numerous opportunities to automate it. The fundamental features of most of these techniques are electro dermal activity (EDA) and heart rate values (HRV). We utilized an accelerometer to measure body motions to solve this challenge. The proposed novel method employs a test that measures a subject's electrocardiogram (ECG), galvanic skin values (GSV), HRV values, and body movements in order to provide a low-cost and time-saving solution for detecting stress lifestyle disease in modern times using cyber physical systems. This study provides a new hybrid model for lifestyle disease classification that decreases execution time while picking the best collection of characteristics and increases classification accuracy. The developed approach is capable of dealing with the class imbalance problem by using WESAD (wearable stress and affect dataset) dataset. The new model uses the Grid search (GS) method to select an optimized set of hyper parameters, and it uses a combination of the Correlation coefficient based Recursive feature elimination (CoC-RFE) method for optimal feature selection and gradient boosting as an estimator to classify the dataset, which achieves high accuracy and helps to provide smart, accurate, and high-quality healthcare systems. To demonstrate the validity and utility of the proposed methodology, its performance is compared to those of other well-established machine learning models.
    RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems. (arXiv:2011.07401v2 [cs.PF] UPDATED)
    With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.
    Position-based Prompting for Health Outcome Generation. (arXiv:2204.03489v1 [cs.CL])
    Probing Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomena has been effective especially when these LMs are fine-tuned towards not just data of a specific domain, but also to the style or linguistic pattern of the prompts themselves. We observe that, satisfying a particular linguistic pattern in prompts is an unsustainable constraint that unnecessarily lengthens the probing task, especially because, they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting objective and domain. We therefore explore an idea of using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers to rare prompt templates (in a case study on health outcome generation) such as Postfix and Mixed patterns whose missing information is respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default mask language model (MLM) representation is used to predict masked tokens.
    Modeling Label Correlations for Second-Order Semantic Dependency Parsing with Mean-Field Inference. (arXiv:2204.03619v1 [cs.CL])
    Second-order semantic parsing with end-to-end mean-field inference has been shown good performance. In this work we aim to improve this method by modeling label correlations between adjacent arcs. However, direct modeling leads to memory explosion because second-order score tensors have sizes of $O(n^3L^2)$ ($n$ is the sentence length and $L$ is the number of labels), which is not affordable. To tackle this computational challenge, we leverage tensor decomposition techniques, and interestingly, we show that the large second-order score tensors have no need to be materialized during mean-field inference, thereby reducing the computational complexity from cubic to quadratic. We conduct experiments on SemEval 2015 Task 18 English datasets, showing the effectiveness of modeling label correlations. Our code is publicly available at https://github.com/sustcsonglin/mean-field-dep-parsing.
    FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement. (arXiv:2204.03174v1 [cs.LG])
    As an emerging technology, federated learning (FL) involves training machine learning models over distributed edge devices, which attracts sustained attention and has been extensively studied. However, the heterogeneity of client data severely degrades the performance of FL compared with that in centralized training. It causes the locally trained models of clients to move in different directions. On the one hand, it slows down or even stalls the global updates, leading to inefficient communication. On the other hand, it enlarges the distances between local models, resulting in an aggregated global model with poor performance. Fortunately, these shortcomings can be mitigated by reducing the angle between the directions that local models move in. Based on this fact, we propose FedCos, which reduces the directional inconsistency of local models by introducing a cosine-similarity penalty. It promotes the local model iterations towards an auxiliary global direction. Moreover, our approach is auto-adapt to various non-IID settings without an elaborate selection of hyperparameters. The experimental results show that FedCos outperforms the well-known baselines and can enhance them under a variety of FL scenes, including varying degrees of data heterogeneity, different number of participants, and cross-silo and cross-device settings. Besides, FedCos improves communication efficiency by 2 to 5 times. With the help of FedCos, multiple FL methods require significantly fewer communication rounds than before to obtain a model with comparable performance.
    Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning. (arXiv:2104.08676v2 [cs.CL] UPDATED)
    We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference. We show that by applying additional distribution estimation methods, namely, Monte Carlo (MC) Dropout, Deep Ensemble, Re-Calibration, and Distribution Distillation, models can capture human judgement distribution more effectively than the softmax baseline. We show that MC Dropout is able to achieve decent performance without any distribution annotations while Re-Calibration can give further improvements with extra distribution annotations, suggesting the value of multiple annotations for one example in modeling the distribution of human judgements. Despite these improvements, the best results are still far below the estimated human upper-bound, indicating that predicting the distribution of human judgements is still an open, challenging problem with a large room for improvements. We showcase the common errors for MC Dropout and Re-Calibration. Finally, we give guidelines on the usage of these methods with different levels of data availability and encourage future work on modeling the human opinion distribution for language reasoning. Our code and data are publicly available at https://github.com/easonnie/ChaosNLI
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.
    Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand. (arXiv:2204.03570v1 [cs.CY])
    As the demand for mobility in our society seems to increase, the various issues centered on urban mobility are among those that worry most city inhabitants in this planet. For instance, how to go from A to B in an efficient (but also less stressful) way? These questions and concerns have not changed even during the covid-19 pandemic; on the contrary, as the current stand, people who are avoiding public transportation are only contributing to an increase in the vehicular traffic. The are of intelligent transportation systems (ITS) aims at investigating how to employ information and communication technologies to problems related to transportation. This may mean monitoring and managing the infrastructure (e.g., traffic roads, traffic signals, etc.). However, currently, ITS is also targeting the management of demand. In this panorama, artificial intelligence plays an important role, especially with the advances in machine learning that translates in the use of computational vision, connected and autonomous vehicles, agent-based simulation, among others. In the present work, a survey of several works developed by our group are discussed in a holistic perspective, i.e., they cover not only the supply side (as commonly found in ITS works), but also the demand side, and, in an novel perspective, the integration of both.
    Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data. (arXiv:2204.03216v1 [cs.LG])
    High-dimensional spatio-temporal dynamics can often be encoded in a low-dimensional subspace. Engineering applications for modeling, characterization, design, and control of such large-scale systems often rely on dimensionality reduction to make solutions computationally tractable in real-time. Common existing paradigms for dimensionality reduction include linear methods, such as the singular value decomposition (SVD), and nonlinear methods, such as variants of convolutional autoencoders (CAE). However, these encoding techniques lack the ability to efficiently represent the complexity associated with spatio-temporal data, which often requires variable geometry, non-uniform grid resolution, adaptive meshing, and/or parametric dependencies.To resolve these practical engineering challenges, we propose a general framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF consists of two modified multilayer perceptrons (MLPs): (i) ShapeNet, which isolates and represents the spatial complexity, and (ii) ParameterNet, which accounts for any other input complexity, including parametric dependencies, time, and sensor measurements. We demonstrate the utility of NIF for parametric surrogate modeling, enabling the interpretable representation and compression of complex spatio-temporal dynamics, efficient many-spatial-query tasks, and improved generalization performance for sparse reconstruction.
    Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results. (arXiv:2204.03475v1 [cs.CV])
    ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet
    Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping. (arXiv:2204.03487v1 [cs.LG])
    We investigate the "Visual Pushing for Grasping" (VPG) system by Zeng et al. and the "Hourglass" system by Ewerton et al., an evolution of the former. The focus of our work is the investigation of the capabilities of both systems to learn long-term rewards and policies. Zeng et al. original task only needs a limited amount of foresight. Ewerton et al. attain their best performance using an agent which only takes the most immediate action under consideration. We are interested in the ability of their models and training algorithms to accurately predict long-term Q-Values. To evaluate this ability, we design a new bin sorting task and reward function. Our task requires agents to accurately estimate future rewards and therefore use high discount factors in their Q-Value calculation. We investigate the behaviour of an adaptation of the VPG training algorithm on our task. We show that this adaptation can not accurately predict the required long-term action sequences. In addition to the limitations identified by Ewerton et al., it suffers from the known Deep Q-Learning problem of overestimated Q-Values. In an effort to solve our task, we turn to the Hourglass models and combine them with the Double Q-Learning approach. We show that this approach enables the models to accurately predict long-term action sequences when trained with large discount factors. Our results show that the Double Q-Learning technique is essential for training with very high discount factors, as the models Q-Value predictions diverge otherwise. We also experiment with different approaches for discount factor scheduling, loss calculation and exploration procedures. Our results show that the latter factors do not visibly influence the model's performance for our task.
    Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. (arXiv:2204.03574v1 [cs.LG])
    We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) without the overhead of fine-tuning the entire model. VLMs can represent arbitrary classes as natural language prompts in their flexible text encoders but they underperform state-of-the-art methods on compositional zero-shot benchmark tasks. To improve VLMs, we propose a novel form of soft prompting. We treat the attributes and objects that are composed to define classes as learnable tokens of vocabulary and tune them on multiple prompt compositions. During inference, we recompose the learned attribute-object vocabulary in new combinations and show that CSP outperforms the original VLM on benchmark datasets by an average of 14.7 percentage points of accuracy. CSP also achieves new state-of-the-art accuracies on two out of three benchmark datasets, while only fine-tuning a small number of parameters. Further, we show that CSP improves generalization to higher-order attribute-attribute-object compositions and combinations of pretrained attributes and fine-tuned objects.
    Risk-based regulation for all: The need and a method for a wide adoption solution for data-driven inspection targeting. (arXiv:2204.03583v1 [cs.LG])
    Access to data and data processing, including the use of machine learning techniques, has become significantly easier and cheaper in recent years. Nevertheless, solutions that can be widely adopted by regulators for market monitoring and inspection targeting in a data-driven way have not been frequently discussed by the scientific community. This article discusses the need and the difficulties for the development of such solutions, presents an effective method to address regulation planning, and illustrates its use to account for the most important and common subject for the majority of regulators: the consumer. This article hopes to contribute to increase the awareness of the regulatory community to the need for data processing methods that are objective, impartial, transparent, explainable, simple to implement and with low computational cost, aiming to the implementation of risk-based regulation in the world.
    BERTuit: Understanding Spanish language in Twitter through a native transformer. (arXiv:2204.03465v1 [cs.CL])
    The appearance of complex attention-based language models such as BERT, Roberta or GPT-3 has allowed to address highly complex tasks in a plethora of scenarios. However, when applied to specific domains, these models encounter considerable difficulties. This is the case of Social Networks such as Twitter, an ever-changing stream of information written with informal and complex language, where each message requires careful evaluation to be understood even by humans given the important role that context plays. Addressing tasks in this domain through Natural Language Processing involves severe challenges. When powerful state-of-the-art multilingual language models are applied to this scenario, language specific nuances use to get lost in translation. To face these challenges we present \textbf{BERTuit}, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets using RoBERTa optimization. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network, with special emphasis on solutions devoted to tackle the spreading of misinformation in this platform. BERTuit is evaluated on several tasks and compared against M-BERT, XLM-RoBERTa and XLM-T, very competitive multilingual transformers. The utility of our approach is shown with applications, in this case: a zero-shot methodology to visualize groups of hoaxes and profiling authors spreading disinformation. Misinformation spreads wildly on platforms such as Twitter in languages other than English, meaning performance of transformers may suffer when transferred outside English speaking communities.
    Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum. (arXiv:2204.03236v1 [cs.LG])
    Various neural network models have been proposed to tackle combinatorial optimization problems such as the travelling salesman problem (TSP). Existing learning-based TSP methods adopt a simple setting that the training and testing data are independent and identically distributed. However, the existing literature fails to solve TSP instances when training and testing data have different distributions. Concretely, we find that different training and testing distribution will result in more difficult TSP instances, i.e., the solution obtained by the model has a large gap from the optimal solution. To tackle this problem, in this work, we study learning-based TSP methods when training and testing data have different distributions using adaptive-hardness, i.e., how difficult a TSP instance can be for a solver. This problem is challenging because it is non-trivial to (1) define hardness measurement quantitatively; (2) efficiently and continuously generate sufficiently hard TSP instances upon model training; (3) fully utilize instances with different levels of hardness to learn a more powerful TSP solver. To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances. Then, we propose a hardness-adaptive generator to generate instances with different hardness. We further propose a curriculum learner fully utilizing these instances to train the TSP solver. Experiments show that our hardness-adaptive generator can generate instances ten times harder than the existing methods, and our proposed method achieves significant improvement over state-of-the-art models in terms of the optimality gap.
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.
    Generalised Latent Assimilation in Heterogeneous Reduced Spaces with Machine Learning Surrogate Models. (arXiv:2204.03497v1 [cs.LG])
    Reduced-order modelling and low-dimensional surrogate models generated using machine learning algorithms have been widely applied in high-dimensional dynamical systems to improve the algorithmic efficiency. In this paper, we develop a system which combines reduced-order surrogate models with a novel data assimilation (DA) technique used to incorporate real-time observations from different physical spaces. We make use of local smooth surrogate functions which link the space of encoded system variables and the one of current observations to perform variational DA with a low computational cost. The new system, named Generalised Latent Assimilation can benefit both the efficiency provided by the reduced-order modelling and the accuracy of data assimilation. A theoretical analysis of the difference between surrogate and original assimilation cost function is also provided in this paper where an upper bound, depending on the size of the local training set, is given. The new approach is tested on a high-dimensional CFD application of a two-phase liquid flow with non-linear observation operators that current Latent Assimilation methods can not handle. Numerical results demonstrate that the proposed assimilation approach can significantly improve the reconstruction and prediction accuracy of the deep learning surrogate model which is nearly 1000 times faster than the CFD simulation.
    Explicit Feature Interaction-aware Graph Neural Networks. (arXiv:2204.03225v1 [cs.LG])
    Graph neural networks are powerful methods to handle graph-structured data. However, existing graph neural networks only learn higher-order feature interactions implicitly. Thus, they cannot capture information that occurred in low-order feature interactions. To overcome this problem, we propose Explicit Feature Interaction-aware Graph Neural Network (EFI-GNN), which explicitly learns arbitrary-order feature interactions. EFI-GNN can jointly learn with any other graph neural network. We demonstrate that the joint learning method always enhances performance on the various node classification tasks. Furthermore, since EFI-GNN is inherently a linear model, we can interpret the prediction result of EFI-GNN. With the computation rule, we can obtain an any-order feature's effect on the decision. By that, we visualize the effects of the first-order and second-order features as a form of a heatmap.
    Half-sibling regression meets exoplanet imaging: PSF modeling and subtraction using a flexible, domain knowledge-driven, causal framework. (arXiv:2204.03439v1 [astro-ph.IM])
    High-contrast imaging of exoplanets hinges on powerful post-processing methods to denoise the data and separate the signal of a companion from its host star, which is typically orders of magnitude brighter. Existing post-processing algorithms do not use all prior domain knowledge that is available about the problem. We propose a new method that builds on our understanding of the systematic noise and the causal structure of the data-generating process. Our algorithm is based on a modified version of half-sibling regression (HSR), a flexible denoising framework that combines ideas from the fields of machine learning and causality. We adapt the method to address the specific requirements of high-contrast exoplanet imaging data obtained in pupil tracking mode. The key idea is to estimate the systematic noise in a pixel by regressing the time series of this pixel onto a set of causally independent, signal-free predictor pixels. We use regularized linear models in this work; however, other (non-linear) models are also possible. In a second step, we demonstrate how the HSR framework allows us to incorporate observing conditions such as wind speed or air temperature as additional predictors. When we apply our method to four data sets from the VLT/NACO instrument, our algorithm provides a better false-positive fraction than PCA-based PSF subtraction, a popular baseline method in the field. Additionally, we find that the HSR-based method provides direct and accurate estimates for the contrast of the exoplanets without the need to insert artificial companions for calibration in the data sets. Finally, we present first evidence that using the observing conditions as additional predictors can improve the results. Our HSR-based method provides an alternative, flexible and promising approach to the challenge of modeling and subtracting the stellar PSF and systematic noise in exoplanet imaging data.
    mulEEG: A Multi-View Representation Learning on EEG Signals. (arXiv:2204.03272v1 [cs.LG])
    Modeling effective representations using multiple views that positively influence each other is challenging, and the existing methods perform poorly on Electroencephalogram (EEG) signals for sleep-staging tasks. In this paper, we propose a novel multi-view self-supervised method (mulEEG) for unsupervised EEG representation learning. Our method attempts to effectively utilize the complementary information available in multiple views to learn better representations. We introduce diverse loss that further encourages complementary information across multiple views. Our method with no access to labels beats the supervised training while outperforming multi-view baseline methods on transfer learning experiments carried out on sleep-staging tasks. We posit that our method was able to learn better representations by using complementary multi-views.
    Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories. (arXiv:2204.03051v1 [cs.RO])
    We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a sequence of instructions, to an adequate sequence of movements to be executed by a robot. In the first stage, we perceive and pre-process the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the multimodal latent values into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution in a robotic platform. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities (e.g., image, sound), and generates the corresponding motion trajectory to write it, creating coherent and readable handwritten words.
    DiffCloud: Real-to-Sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects. (arXiv:2204.03139v1 [cs.RO])
    Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed. We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude. Videos and supplementary material are available at https://tinyurl.com/diffcloud.
    Machine Learning-Enabled IoT Security: Open Issues and Challenges Under Advanced Persistent Threats. (arXiv:2204.03433v1 [cs.CR])
    Despite its technological benefits, Internet of Things (IoT) has cyber weaknesses due to the vulnerabilities in the wireless medium. Machine learning (ML)-based methods are widely used against cyber threats in IoT networks with promising performance. Advanced persistent threat (APT) is prominent for cybercriminals to compromise networks, and it is crucial to long-term and harmful characteristics. However, it is difficult to apply ML-based approaches to identify APT attacks to obtain a promising detection performance due to an extremely small percentage among normal traffic. There are limited surveys to fully investigate APT attacks in IoT networks due to the lack of public datasets with all types of APT attacks. It is worth to bridge the state-of-the-art in network attack detection with APT attack detection in a comprehensive review article. This survey article reviews the security challenges in IoT networks and presents the well-known attacks, APT attacks, and threat models in IoT systems. Meanwhile, signature-based, anomaly-based, and hybrid intrusion detection systems are summarized for IoT networks. The article highlights statistical insights regarding frequently applied ML-based methods against network intrusion alongside the number of attacks types detected. Finally, open issues and challenges for common network intrusion and APT attacks are presented for future research.
    Few-Shot Forecasting of Time-Series with Heterogeneous Channels. (arXiv:2204.03456v1 [cs.LG])
    Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information.
    Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift. (arXiv:2204.03342v1 [cs.LG])
    The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representations that reduces domain discrepancy. This paper proposes a novel supervised domain adaptation based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover distance with Laplacian regularization, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and maximum mean discrepancy with higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.
    Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With Class-Dependent Confidence. (arXiv:2204.03431v1 [cs.LG])
    Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for "easy" inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach.
    PALBERT: Teaching ALBERT to Ponder. (arXiv:2204.03276v1 [cs.LG])
    Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence on wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed. Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layers index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from i-th layer, introduces major variance in model outputs, significantly reducing the resulting models performance. In this paper, we propose Ponder ALBERT (PALBERT): an improvement to PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We compared PALBERT with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.
    Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch. (arXiv:2204.03418v1 [cs.LG])
    We present Continual Inference, a Python library for implementing Continual Inference Networks (CINs) in PyTorch, a class of Neural Networks designed specifically for efficient inference in both online and batch processing scenarios. We offer a comprehensive introduction and guide to CINs and their implementation in practice, and provide best-practices and code examples for composing complex modules for modern Deep Learning. Continual Inference is readily downloadable via the Python Package Index and at \url{www.github.com/lukashedegaard/continual-inference}.
    Using Decision Tree as Local Interpretable Model in Autoencoder-based LIME. (arXiv:2204.03321v1 [cs.LG])
    Nowadays, deep neural networks are being used in many domains because of their high accuracy results. However, they are considered as "black box", means that they are not explainable for humans. On the other hand, in some tasks such as medical, economic, and self-driving cars, users want the model to be interpretable to decide if they can trust these results or not. In this work, we present a modified version of an autoencoder-based approach for local interpretability called ALIME. The ALIME itself is inspired by a famous method called Local Interpretable Model-agnostic Explanations (LIME). LIME generates a single instance level explanation by generating new data around the instance and training a local linear interpretable model. ALIME uses an autoencoder to weigh the new data around the sample. Nevertheless, the ALIME uses a linear model as the interpretable model to be trained locally, just like the LIME. This work proposes a new approach, which uses a decision tree instead of the linear model, as the interpretable model. We evaluate the proposed model in case of stability, local fidelity, and interpretability on different datasets. Compared to ALIME, the experiments show significant results on stability and local fidelity and improved results on interpretability.
    Fusing finetuned models for better pretraining. (arXiv:2204.03044v1 [cs.CL])
    Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour that few can undertake. In this paper, we create better base models at hardly any cost, by fusing multiple existing fine tuned models into one. Specifically, we fuse by averaging the weights of these models. We show that the fused model results surpass the pretrained model ones. We also show that fusing is often better than intertraining. We find that fusing is less dependent on the target task. Furthermore, weight decay nullifies intertraining effects but not those of fusing.
    Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators. (arXiv:2204.03243v1 [cs.CL])
    We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
    Distributed Statistical Min-Max Learning in the Presence of Byzantine Agents. (arXiv:2204.03187v1 [cs.LG])
    Recent years have witnessed a growing interest in the topic of min-max optimization, owing to its relevance in the context of generative adversarial networks (GANs), robust control and optimization, and reinforcement learning. Motivated by this line of work, we consider a multi-agent min-max learning problem, and focus on the emerging challenge of contending with worst-case Byzantine adversarial agents in such a setup. By drawing on recent results from robust statistics, we design a robust distributed variant of the extra-gradient algorithm - a popular algorithmic approach for min-max optimization. Our main contribution is to provide a crisp analysis of the proposed robust extra-gradient algorithm for smooth convex-concave and smooth strongly convex-strongly concave functions. Specifically, we establish statistical rates of convergence to approximate saddle points. Our rates are near-optimal, and reveal both the effect of adversarial corruption and the benefit of collaboration among the non-faulty agents. Notably, this is the first paper to provide formal theoretical guarantees for large-scale distributed min-max learning in the presence of adversarial agents.
    Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms. (arXiv:2204.03214v1 [cs.CR])
    The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate the well performance of these models on software vulnerability detection. The answer enables extending transformer-based language models to vulnerability detection and leveraging superior performance beyond the natural language processing domain. Besides, we perform the model's security check using Microsoft's Counterfit, a command-line tool to assess the model's security. Our results find that these models are vulnerable to adversarial examples. In this regard, we present a simple countermeasure and its result. Experimenting with large models is always a challenge due to the requirement of computing resources and platforms/libraries & dependencies. Based on the experiences and difficulties we faced during this work, we present our recommendation while choosing the platforms to run these large models. Moreover, the popular platforms are surveyed thoroughly in this paper.
    Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes. (arXiv:2204.03376v1 [cs.LG])
    Hybrid closed loop systems represent the future of care for people with type 1 diabetes (T1D). These devices usually utilise simple control algorithms to select the optimal insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning (RL) has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This work presents an evaluation of offline RL as a means for developing clinically effective dosing policies without the need for patient interaction. This paper examines the utility of BCQ, CQL and TD3-BC in managing the blood glucose of nine virtual patients within the UVA/Padova glucose dynamics simulator. When trained on less than a tenth of the data required by online RL approaches, this work shows that offline RL can significantly increase time in the healthy blood glucose range when compared to the strongest state-of-art baseline. This is achieved without any associated increase in low blood glucose events. Offline RL is also shown to be able to correct for common and challenging scenarios such as incorrect bolus dosing, irregular meal timings and sub-optimal training data.
    Graph Neural Networks Designed for Different Graph Types: A Survey. (arXiv:2204.03080v1 [cs.LG])
    Graphs are ubiquitous in nature and can therefore serve as models for many practical but also theoretical problems. Based on this, the young research field of Graph Neural Networks (GNNs) has emerged. Despite the youth of the field and the speed in which new models are developed, many good surveys have been published in the last years. Nevertheless, an overview on which graph types can be modeled by GNNs is missing. In this survey, we give a detailed overview of already existing GNNs and, unlike previous surveys, categorize them according to their ability to handle different graph types. We consider GNNs operating on static as well as on dynamic graphs of different structural constitutions, with or without node or edge attributes. Moreover in the dynamic case, we separate the models in discrete-time and continuous-time dynamic graphs based on their architecture. According to our findings, there are still graph types, that are not covered by existing GNN models. Specifically, models concerning heterogeneity in attributes are missing and the deletion of nodes and edges is only covered rarely.
    Jacobian Norm for Unsupervised Source-Free Domain Adaptation. (arXiv:2204.03467v1 [cs.LG])
    Unsupervised Source (data) Free domain adaptation (USFDA) aims to transfer knowledge from a well-trained source model to a related but unlabeled target domain. In such a scenario, all conventional adaptation methods that require source data fail. To combat this challenge, existing USFDAs turn to transfer knowledge by aligning the target feature to the latent distribution hidden in the source model. However, such information is naturally limited. Thus, the alignment in such a scenario is not only difficult but also insufficient, which degrades the target generalization performance. To relieve this dilemma in current USFDAs, we are motivated to explore a new perspective to boost their performance. For this purpose and gaining necessary insight, we look back upon the origin of the domain adaptation and first theoretically derive a new-brand target generalization error bound based on the model smoothness. Then, following the theoretical insight, a general and model-smoothness-guided Jacobian norm (JN) regularizer is designed and imposed on the target domain to mitigate this dilemma. Extensive experiments are conducted to validate its effectiveness. In its implementation, just with a few lines of codes added to the existing USFDAs, we achieve superior results on various benchmark datasets.
    Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation. (arXiv:2204.03500v1 [cs.LG])
    The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FeSTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FeSTA, p-FeSTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance without sacrificing privacy. Experimental results confirm that the proposed method significantly enhances the benefit of multi-task collaboration, communication efficiency, and privacy preservation, shedding light on practical multi-task distributed learning in the field of medical imaging.
    Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules. (arXiv:2109.10476v2 [cs.LG] UPDATED)
    We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.
    RF Signal Transformation and Classification using Deep Neural Networks. (arXiv:2204.03564v1 [eess.SP])
    Deep neural networks (DNNs) designed for computer vision and natural language processing tasks cannot be directly applied to the radio frequency (RF) datasets. To address this challenge, we propose to convert the raw RF data to data types that are suitable for off-the-shelf DNNs by introducing a convolutional transform technique. In addition, we propose a simple 5-layer convolutional neural network architecture (CONV-5) that can operate with raw RF I/Q data without any transformation. Further, we put forward an RF dataset, referred to as RF1024, to facilitate future RF research. RF1024 consists of 8 different RF modulation classes with each class having 1000/200 training/test samples. Each sample of the RF1024 dataset contains 1024 complex I/Q values. Lastly, the experiments are performed on the RadioML2016 and RF1024 datasets to demonstrate the improved classification performance.
    AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis. (arXiv:2204.03105v1 [cs.CV])
    In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis. Previous works either apply spherical texture maps which may lead to large distortions, or use continuous texture fields that yield smooth outputs lacking details. We argue that the traditional way of representing textures with images and linking them to a 3D mesh via UV mapping is more desirable, since synthesizing 2D images is a well-studied problem. We propose AUV-Net which learns to embed 3D surfaces into a 2D aligned UV space, by mapping the corresponding semantic parts of different 3D shapes to the same location in the UV space. As a result, textures are aligned across objects, and can thus be easily synthesized by generative models of images. Texture alignment is learned in an unsupervised manner by a simple yet effective texture alignment module, taking inspiration from traditional works on linear subspace learning. The learned UV mapping and aligned texture representations enable a variety of applications including texture transfer, texture synthesis, and textured single view 3D reconstruction. We conduct experiments on multiple datasets to demonstrate the effectiveness of our method. Project page: https://nv-tlabs.github.io/AUV-NET.
    FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity. (arXiv:2204.03529v1 [cs.LG])
    Federated Learning (FL) is an emerging framework for distributed processing of large data volumes by edge devices subject to limited communication bandwidths, heterogeneity in data distributions and computational resources, as well as privacy considerations. In this paper, we introduce a new FL protocol termed FedADMM based on primal-dual optimization. The proposed method leverages dual variables to tackle statistical heterogeneity, and accommodates system heterogeneity by tolerating variable amount of work performed by clients. FedADMM maintains identical communication costs per round as FedAvg/Prox, and generalizes them via the augmented Lagrangian. A convergence proof is established for nonconvex objectives, under no restrictions in terms of data dissimilarity or number of participants per round of the algorithm. We demonstrate the merits through extensive experiments on real datasets, under both IID and non-IID data distributions across clients. FedADMM consistently outperforms all baseline methods in terms of communication efficiency, with the number of rounds needed to reach a prescribed accuracy reduced by up to 87%. The algorithm effectively adapts to heterogeneous data distributions through the use of dual variables, without the need for hyperparameter tuning, and its advantages are more pronounced in large-scale systems.
    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. (arXiv:2204.03310v1 [eess.AS])
    Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
    AI-aided Traffic Control Scheme for M2M Communications in the Internet of Vehicles. (arXiv:2204.03504v1 [cs.NI])
    Due to the rapid growth of data transmissions in internet of vehicles (IoV), finding schemes that can effectively alleviate access congestion has become an important issue. Recently, many traffic control schemes have been studied. Nevertheless, the dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies, which is significant for the random access resource allocation. In this paper, we consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it. Firstly, IoV devices are divided into various classes based on delay characteristics. The target of maximizing the successful transmission of packets with the success rate constraint is established. Then, the optimization objective is transformed into a markov decision process (MDP) model. Finally, the access class barring (ACB) factors are obtained based on the PPO method to maximize the number of successful access devices. The performance of the proposal algorithm in respect of successful events and delay compared to existing schemes is verified by simulations.
    Enabling Deep Learning for All-in EDGE paradigm. (arXiv:2204.03326v1 [cs.LG])
    Deep Learning-based models have been widely investigated, and they have demonstrated significant performance on non-trivial tasks such as speech recognition, image processing, and natural language understanding. However, this is at the cost of substantial data requirements. Considering the widespread proliferation of edge devices (e.g. Internet of Things devices) over the last decade, Deep Learning in the edge paradigm, such as device-cloud integrated platforms, is required to leverage its superior performance. Moreover, it is suitable from the data requirements perspective in the edge paradigm because the proliferation of edge devices has resulted in an explosion in the volume of generated and collected data. However, there are difficulties due to other requirements such as high computation, high latency, and high bandwidth caused by Deep Learning applications in real-world scenarios. In this regard, this survey paper investigates Deep Learning at the edge, its architecture, enabling technologies, and model adaption techniques, where edge servers and edge devices participate in deep learning training and inference. For simplicity, we call this paradigm the All-in EDGE paradigm. Besides, this paper presents the key performance metrics for Deep Learning at the All-in EDGE paradigm to evaluate various deep learning techniques and choose a suitable design. Moreover, various open challenges arising from the deployment of Deep Learning at the All-in EDGE paradigm are identified and discussed.
    Standardized feature extraction from pairwise conflicts applied to the train rescheduling problem. (arXiv:2204.03061v1 [cs.LG])
    We propose a train rescheduling algorithm which applies a standardized feature selection based on pairwise conflicts in order to serve as input for the reinforcement learning framework. We implement an analytical method which identifies and optimally solves every conflict arising between two trains, then we design a corresponding observation space which features the most relevant information considering these conflicts. The data obtained this way then translates to actions in the context of the reinforcement learning framework. We test our preliminary model using the evaluation metrics of the Flatland Challenge. The empirical results indicate that the suggested feature space provides meaningful observations, from which a sensible scheduling policy can be learned.
    Optimization Models and Interpretations for Three Types of Adversarial Perturbations against Support Vector Machines. (arXiv:2204.03154v1 [cs.LG])
    Adversarial perturbations have drawn great attentions in various deep neural networks. Most of them are computed by iterations and cannot be interpreted very well. In contrast, little attentions are paid to basic machine learning models such as support vector machines. In this paper, we investigate the optimization models and the interpretations for three types of adversarial perturbations against support vector machines, including sample-adversarial perturbations (sAP), class-universal adversarial perturbations (cuAP) as well as universal adversarial perturbations (uAP). For linear binary/multi classification support vector machines (SVMs), we derive the explicit solutions for sAP, cuAP and uAP (binary case), and approximate solution for uAP of multi-classification. We also obtain the upper bound of fooling rate for uAP. Such results not only increase the interpretability of the three adversarial perturbations, but also provide great convenience in computation since iterative process can be avoided. Numerical results show that our method is fast and effective in calculating three types of adversarial perturbations.
    Temporal Alignment for History Representation in Reinforcement Learning. (arXiv:2204.03525v1 [cs.LG])
    Environments in Reinforcement Learning are usually only partially observable. To address this problem, a possible solution is to provide the agent with information about the past. However, providing complete observations of numerous steps can be excessive. Inspired by human memory, we propose to represent history with only important changes in the environment and, in our approach, to obtain automatically this representation using self-supervision. Our method (TempAl) aligns temporally-close frames, revealing a general, slowly varying state of the environment. This procedure is based on contrastive loss, which pulls embeddings of nearby observations to each other while pushing away other samples from the batch. It can be interpreted as a metric that captures the temporal relations of observations. We propose to combine both common instantaneous and our history representation and we evaluate TempAl on all available Atari games from the Arcade Learning Environment. TempAl surpasses the instantaneous-only baseline in 35 environments out of 49. The source code of the method and of all the experiments is available at https://github.com/htdt/tempal.
    Multi-task nonparallel support vector machine for classification. (arXiv:2204.02972v1 [cs.LG])
    Direct multi-task twin support vector machine (DMTSVM) explores the shared information between multiple correlated tasks, then it produces better generalization performance. However, it contains matrix inversion operation when solving the dual problems, so it costs much running time. Moreover, kernel trick cannot be directly utilized in the nonlinear case. To effectively avoid above problems, a novel multi-task nonparallel support vector machine (MTNPSVM) including linear and nonlinear cases is proposed in this paper. By introducing epsilon-insensitive loss instead of square loss in DMTSVM, MTNPSVM effectively avoids matrix inversion operation and takes full advantage of the kernel trick. Theoretical implication of the model is further discussed. To further improve the computational efficiency, the alternating direction method of multipliers (ADMM) is employed when solving the dual problem. The computational complexity and convergence of the algorithm are provided. In addition, the property and sensitivity of the parameter in model are further explored. The experimental results on fifteen benchmark datasets and twelve image datasets demonstrate the validity of MTNPSVM in comparison with the state-of-the-art algorithms. Finally, it is applied to real Chinese Wine dataset, and also verifies its effectiveness.
    Federated Learning for Distributed Spectrum Sensing in NextG Communication Networks. (arXiv:2204.03027v1 [cs.NI])
    NextG networks are intended to provide the flexibility of sharing the spectrum with incumbent users and support various spectrum monitoring tasks such as anomaly detection, fault diagnostics, user equipment identification, and authentication. A network of wireless sensors is needed to monitor the spectrum for signal transmissions of interest over a large deployment area. Each sensor receives signals under a specific channel condition depending on its location and trains an individual model of a deep neural network (DNN) accordingly to classify signals. To improve the accuracy, individual sensors may exchange sensing data or sensor results with each other or with a fusion center (such as in cooperative spectrum sensing). In this paper, distributed federated learning over a multi-hop wireless network is considered to collectively train a DNN for signal identification. In distributed federated learning, each sensor broadcasts its trained model to its neighbors, collects the DNN models from its neighbors, and aggregates them to initialize its own model for the next round of training. Without exchanging any spectrum data, this process is repeated over time such that a common DNN is built across the network while preserving the privacy associated with signals collected at different locations. Signal classification accuracy and convergence time are evaluated for different network topologies (including line, star, ring, grid, and random networks) and packet loss events. Then, the reduction of communication overhead and energy consumption is considered with random participation of sensors in model updates. The results show the feasibility of extending cooperative spectrum sensing over a general multi-hop wireless network through federated learning and indicate its robustness to wireless network effects, thereby sustaining high accuracy with low communication overhead and energy consumption.
    Faster algorithms for learning to link, align sequences, and price two-part tariffs. (arXiv:2204.03569v1 [cs.DS])
    Data-driven algorithm configuration is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of efficient data-driven algorithms for algorithm families with more than one parameter. In this work we provide algorithms for efficient (output-polynomial) multidimensional parameter tuning, i.e. for families with a small constant number of parameters, for three very different combinatorial problems -- linkage-based clustering, dynamic programming for sequence alignment, and auction design for two-part tariff schemes. We extend the single-parameter clustering algorithm of Balcan et al. 2020 arXiv:1907.00533 to multiple parameters and to the sequence alignment problem by proposing an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. A key problem-specific challenge is to efficiently compute how the partition of the parameter space (into regions with unique algorithmic states) changes with a single algorithmic step. We give algorithms which improve on the runtime of previously best known results for linkage-based clustering, sequence alignment and two-part tariff pricing.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Knowledge Infused Decoding. (arXiv:2204.03084v1 [cs.CL])
    Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID) -- a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https://github.com/microsoft/KID.
    Enhancement on Model Interpretability and Sleep Stage Scoring Performance with A Novel Pipeline Based on Deep Neural Network. (arXiv:2204.03173v1 [cs.LG])
    Considering the natural frequency characteristics in sleep medicine, this paper first proposes a time-frequency framework for the representation learning of the electroencephalogram (EEG) following the definition of the American Academy of Sleep Medicine. To meet the temporal-random and transient nature of the defining characteristics of sleep stages, we further design a context-sensitive flexible pipeline that automatically adapts to the attributes of data itself. That is, the input EEG spectrogram is partitioned into a sequence of patches in the time and frequency axes, and then input to a delicate deep learning network for further representation learning to extract the stage-dependent features, which are used in the classification step finally. The proposed pipeline is validated against a large database, i.e., the Sleep Heart Health Study (SHHS), and the results demonstrate that the competitive performance for the wake, N2, and N3 stages outperforms the state-of-art works, with the F1 scores being 0.93, 0.88, and 0.87, respectively, and the proposed method has a high inter-rater reliability of 0.80 kappa. Importantly, we visualize the stage scoring process of the model decision with the Layer-wise Relevance Propagation (LRP) method, which shows that the proposed pipeline is more sensitive and perceivable in the decision-making process than the baseline pipelines. Therefore, the pipeline together with the LRP method can provide better model interpretability, which is important for clinical support.
    Video Diffusion Models. (arXiv:2204.03458v1 [cs.CV])
    Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark. Supplementary material is available at https://video-diffusion.github.io/
    Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data. (arXiv:2204.02973v1 [cs.LG])
    Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.
    Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation. (arXiv:2204.03293v1 [cs.SE])
    Code search aims to retrieve the most semantically relevant code snippet for a given natural language query. Recently, large-scale code pre-trained models such as CodeBERT and GraphCodeBERT learn generic representations of source code and have achieved substantial improvement on code search task. However, the high-quality sequence-level representations of code snippets have not been sufficiently explored. In this paper, we propose a new approach with multimodal contrastive learning and soft data augmentation for code search. Multimodal contrastive learning is used to pull together the representations of code-query pairs and push apart the unpaired code snippets and queries. Moreover, data augmentation is critical in contrastive learning for learning high-quality representations. However, only semantic-preserving augmentations for source code are considered in existing work. In this work, we propose to do soft data augmentation by dynamically masking and replacing some tokens in code sequences to generate code snippets that are similar but not necessarily semantic-preserving as positive samples for paired queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. The experimental results show that our approach significantly outperforms the state-of-the-art methods. We also adapt our techniques to several pre-trained models such as RoBERTa and CodeBERT, and significantly boost their performance on the code search task.
    Self supervised learning for robust voice cloning. (arXiv:2204.03421v1 [cs.SD])
    Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework via the Bootstrap Your Own Latent (BYOL) method, which is shown to produce high quality speech representations when specific audio augmentations are applied to the vanilla algorithm. We further extend the augmentations in the training procedure to aid the resulting features to capture the speaker identity and to make them robust to noise and acoustic conditions. The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture, aiming to achieve multispeaker speech synthesis without utilizing additional speaker features. This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice. Subjective and objective evaluations are used to validate the proposed model, as well as the robustness to the acoustic conditions of the target utterance.
    Adaptive Spike-Like Representation of EEG Signals for Sleep Stages Scoring. (arXiv:2204.03565v1 [eess.SP])
    Recently there has seen promising results on automatic stage scoring by extracting spatio-temporal features from electroencephalogram (EEG). Such methods entail laborious manual feature engineering and domain knowledge. In this study, we propose an adaptive scheme to probabilistically encode, filter and accumulate the input signals and weight the resultant features by the half-Gaussian probabilities of signal intensities. The adaptive representations are subsequently fed into a transformer model to automatically mine the relevance between features and corresponding stages. Extensive experiments on the largest public dataset against state-of-the-art methods validate the effectiveness of our proposed method and reveal promising future directions.
    Deep transfer learning for system identification using long short-term memory neural networks. (arXiv:2204.03125v1 [eess.SY])
    Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources.
  • Open

    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.  ( 2 min )
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.  ( 2 min )
    Distributionally Robust Optimal Power Flow with Contextual Information. (arXiv:2109.07896v2 [math.OC] UPDATED)
    In this paper, we develop a distributionally robust chance-constrained formulation of the Optimal Power Flow problem (OPF) whereby the system operator can leverage contextual information. For this purpose, we exploit an ambiguity set based on probability trimmings and optimal transport through which the dispatch solution is protected against the incomplete knowledge of the relationship between the OPF uncertainties and the context that is conveyed by a sample of their joint probability distribution. We provide a tractable reformulation of the proposed distributionally robust chance-constrained OPF problem under the popular conditional-value-at-risk approximation. By way of numerical experiments run on a modified IEEE-118 bus network with wind uncertainty, we show how the power system can substantially benefit from taking into account the well-known statistical dependence between the point forecast of wind power outputs and its associated prediction error. Furthermore, the experiments conducted also reveal that the distributional robustness conferred on the OPF solution by our probability-trimmings-based approach is superior to that bestowed by alternative approaches in terms of expected cost and system reliability.  ( 2 min )
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.  ( 2 min )
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.  ( 2 min )
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.  ( 3 min )
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Categorical Distributions of Maximum Entropy under Marginal Constraints. (arXiv:2204.03406v1 [hep-th])
    The estimation of categorical distributions under marginal constraints summarizing some sample from a population in the most-generalizable way is key for many machine-learning and data-driven approaches. We provide a parameter-agnostic theoretical framework that enables this task ensuring (i) that a categorical distribution of Maximum Entropy under marginal constraints always exists and (ii) that it is unique. The procedure of iterative proportional fitting (IPF) naturally estimates that distribution from any consistent set of marginal constraints directly in the space of probabilities, thus deductively identifying a least-biased characterization of the population. The theoretical framework together with IPF leads to a holistic workflow that enables modeling any class of categorical distributions solely using the phenomenological information provided.  ( 2 min )
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.  ( 2 min )
    Sliced gradient-enhanced Kriging for high-dimensional function approximation. (arXiv:2204.03562v1 [stat.ML])
    Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. However, it tends to get impractical for high-dimensional problems due to the large inherent correlation matrix and the associated high-dimensional hyper-parameter tuning problem. To address these issues, we propose a new method in this paper, called sliced GE-Kriging (SGE-Kriging) for reducing both the size of the correlation matrix and the number of hyper-parameters. Firstly, we perform a derivative-based global sensitivity analysis to detect the relative importance of each input variable with respect to model response. Then, we propose to split the training sample set into multiple slices, and invoke Bayes' theorem to approximate the full likelihood function via a sliced likelihood function, in which multiple small correlation matrices are utilized to describe the correlation of the sample set. Additionally, we replace the original high-dimensional hyper-parameter tuning problem with a low-dimensional counterpart by learning the relationship between the hyper-parameters and the global sensitivity indices. Finally, we validate SGE-Kriging by means of numerical experiments with several benchmarks problems. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs. The benefits are most evident in high-dimensional problems.
    Data blurring: sample splitting a single sample. (arXiv:2112.11079v2 [stat.ME] UPDATED)
    Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.  ( 2 min )
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.  ( 3 min )
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.  ( 2 min )
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.  ( 2 min )
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.  ( 2 min )
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.  ( 2 min )
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    A novel nonconvex, smooth-at-origin penalty for statistical learning. (arXiv:2204.03123v1 [stat.ML])
    Nonconvex penalties are utilized for regularization in high-dimensional statistical learning algorithms primarily because they yield unbiased or nearly unbiased estimators for the parameters in the model. Nonconvex penalties existing in the literature such as SCAD, MCP, Laplace and arctan have a singularity at origin which makes them useful also for variable selection. However, in several high-dimensional frameworks such as deep learning, variable selection is less of a concern. In this paper, we present a nonconvex penalty which is smooth at origin. The paper includes asymptotic results for ordinary least squares estimators regularized with the new penalty function, showing asymptotic bias that vanishes exponentially fast. We also conducted an empirical study employing deep neural network architecture on three datasets and convolutional neural network on four datasets. The empirical study showed better performance for the new regularization approach in five out of the seven datasets.  ( 2 min )
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.  ( 2 min )
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.  ( 2 min )
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )

  • Open

    [D] Any guesstimates for how much a DALLE 2 generation will eventually cost?
    Just based on the estimated running costs of GPT3, and then whatever profit gets applied on top of that, are there any estimates for what openai will eventually charge for image generation? submitted by /u/EugeneJudo [link] [comments]  ( 1 min )
    Problem with CVPR template and arXiv? [D]
    I don't know what would be the best place to post this. But I am having trouble uploading an Overleaf manuscript to arXiv based on the CVPR 2022 template. I am getting the following error. Does anyone have any ideas? ​ https://preview.redd.it/y9jq0rbysds81.png?width=1614&format=png&auto=webp&s=16be3d32468837f649e846ec8a309dab2854c762 submitted by /u/avd4292 [link] [comments]
    [R]Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - Google Apr 2022
    Paper: https://arxiv.org/abs/2204.00598 https://socraticmodels.github.io/ Twitter: https://twitter.com/andyzengtweets/status/1512089759497269251 Abstract: " Large foundation models can exhibit unique capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g. from spreadsheets, to SAT questions). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -- in whi…  ( 1 min )
    [D] What to do next after the sanity check?
    I have two years of time-series data taken from two sensors which I have split into 80/10/10 non-overlapping train/val/test splits. The task is to denoise one sensor data into another and I am handling it as a regression problem. I am following this website and considered an already published model (5 convolutional and 1 fully connected layer) which is trained on a similar dataset and same task. For the sake of sanity check, as per the website, I have trained the model on a subset of trainset (3 months) and tried to overfit it (while evaluating on complete val set), which works fine. However, I am not sure what to do next from this point on? Shall I just train on the complete trainset now? Or do I increase the layers or play with other hyper params to find more details about my regression problem/data? I would really appreciate your comments. Thank you. PS. The target value is sparse i.e. more than 85% of the time it is zero. submitted by /u/muaz_usmani [link] [comments]  ( 1 min )
    [D] Leaving ML for Software Engineering?
    I'm keen to hear from people who have made the transition from ML Research/Engineering positions to software engineering roles (or who are considering it). What were your reasons for doing it and did you regret it? I see so many articles about transition from software to ML but none about going the opposite direction. Context: I've been working as an ML Engineer for a little over a year, and I'm just... not enjoying it. I want to love my job so badly as I like my boss, my colleagues, and the company (and I'm paid quite well for my level), but I just don't. I feel like the type of work I'm doing is not very smart and yet it's extremely draining: I spend so many hours just looking at loss curves, tweaking features and parameters. I'm somehow bored and stressed at the same time, because I don't enjoy the work and yet I feel the pressure to produce good models, and when they don't work as expected I can't help but take it personally as if if I just tried hard enough they would work. I find that the days were I end up having to take care of more purely engineering tasks I just have a lot more fun and I finish the day more satisfied and less drained. I think I just want to build something instead of spending hours banging my head against shit data. I would love to hear from people who feel or have felt the same way because whenever I speak about this with friends who are in ML they look at me like I'm a lunatic for wanting to leave it for software engineering. I'm obviously aware that swe roles are not all fun and games, but I just feel like there's been an excessive push for so many people to move to ML as it's "cool" and "smart" when in reality they're just different things who are going to suit different people. submitted by /u/hedy-m [link] [comments]  ( 5 min )
    [D] Is virtual ICLR 2022 worth paying for
    The 2022 ICLR conference at the end of this month is virtual and costs $100 to attend. I was thinking of attending for networking opportunities but I’m not sure. Is it a good idea to go for it? submitted by /u/sybar142857 [link] [comments]  ( 1 min )
    [D] Triplet vs. Contrastive Loss
    The online triplet mining strategy is more efficient than the offline one. It implies "getting a batch of n samples and their associated labels, and form triplets on the fly." Here is an article about Triplet vs. Contrastive Loss comparison and its efficient implementation. I would like to know your feedback. submitted by /u/devzaya [link] [comments]
    [P] Animated Character Generator
    Hello everybody, I'd like to share the latest machine learning project of mine. It allows one to generate animated characters in the style of old video game consoles. Here are some examples. I would appreciate any feedback. https://i.redd.it/sig6ilpi8bs81.gif https://i.redd.it/xaf906qi8bs81.gif https://i.redd.it/8v7lz2qi8bs81.gif submitted by /u/ie9res [link] [comments]
    [D] Annotation formats for image annotations?
    Hey ML people, what is your favorite annotation format for image bounding boxes/labels? I know coco is very popular, we are rethinking parts of our data infrastructure wondering what everyone is using. Our platform hosts hundreds of millions of images. Ideal format would support running queries on data stored in a Data lake If the format supports 3D annotation types that is even better. Thanks for your insights in advance. submitted by /u/mmuppidi [link] [comments]  ( 1 min )
    [N] OpenAI's DALL-E 2 paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" has been updated with added section "Training details" (see Appendix C)
    New version of paper is linked to in the DALL-E 2 blog post and also here (pdf file format). Tweet announcing updated paper. Older version of paper (pdf file format). Original Reddit post. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    Dense Passage Retriever(DPR) Open-QA System [P]
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] Works that can process variable input resolution of images
    Hi. I'm looking for existing computer vision papers /networks that can process variable input resolution. Can anyone share me with similar works? For example, a network/layer N can take both inputs with H*W and 2H*2H individually and give correct prediction. One of them I know is ROI pooling used in Faster RCNN. Thanks very much. submitted by /u/vincent341 [link] [comments]  ( 1 min )
    [D] Machine Learning Engineers - What Does Your Day Involve?
    Hey, I'm looking to transition from my current role as a data scientist to one that has a machine learning engineering focus. I was wondering if anyone could provide insights into how they plan their day, or what activities you do throughout the day/week. I'd be particularly interested to understand the balance between deploying models/writing production worthy code and your time spent learning/developing given the field is moving so fast. submitted by /u/MenArePigs69 [link] [comments]  ( 4 min )
    [D] Bayesian Non-Parametrics for Ranking?
    I am currently sitting at a difficult machine-learning problem that I have found no literature on how to solve it. I am given n datapoints x_1,...,x_n that are ordered according to a ranking preference rank(x_1)<rank(x_2)<...<rank(x_n). I am assuming there exists a function f, such, that f(x_i)<f(x_i+1). I am now searching a Bayesian non-parametric model that gives the posterior probability of functions f that abide f(x_i)<f(x_i+1), so that i can estimate the relative rank preferences at new points. I have tried out a few things. The naive approach is using a GP prior on f. Unfortunately, computing the posterior distribution p(f(x_1), ... f(x_n)| f(x_1)<...<f(x_n)) has no closed form solution (it is a normal distribution with N linear constraints, which is absolutely terrible to sample from). This makes computing conditional distributions for predictions very challenging. I am currently approximating the solution by using a GP regression model with label y_i = rank(x_i)=i. But this is systematically under-estimating the shape-variation, due to the fact that it adds the assumption that function values between ranks are equidistant. Is there any known approach how to do this? submitted by /u/Ulfgardleo [link] [comments]  ( 2 min )
    [R] Video Diffusion Models
    From the webpage: We present results on video generation using diffusion models. We propose an architecture for video diffusion models which is a natural extension of the standard image architecture. We show that this architecture is effective for jointly training from image and video data. To generate long and higher resolution videos we introduce a new conditioning technique that performs better than previously proposed methods. We present results on text-conditioned video generation and state-of-the-art results on an unconditional video generation benchmark. Paper: https://arxiv.org/abs/2204.03458 https://video-diffusion.github.io/ submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] how to decide publication venue
    How to decide if a paper is appropriate for a specific venue? Moreover, how would you categorize the difference between a good NiPs publication and a good CvPR or ICCV publication? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] LR Warmup for PyTorch
    ​ RadamWarmup + CosineAnnealingLR + StepLR Colab Link pytorch_warmup v0.1.0 was released. submitted by /u/TonyY_RIMCS [link] [comments]
  • Open

    RL for dynamic environments
    In their 2019 review article in Nature Machine Intelligence, Neftci and Averbeck point out, “Most work in biological systems has focused on simple learning problems… where flexibility and ongoing learning are important, similar to real-world learning problems. In contrast, most work in artificial agents has focused on learning a single complex problem in a static environment.” Are there RL approaches designed to handle dynamic environments with changing reward functions? I did find this earlier post, but thought I'd ask if anyone had other suggested lines of reading. Thanks! submitted by /u/Careless-Argument-37 [link] [comments]  ( 1 min )
    Observationspace max Size?
    I want to give my AI as many information as possible. Can there be an Issue with a too large observation space? submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    "UC Berkeley’s Pieter Abbeel receives 2021 ACM Prize in Computing" (for DRL robotics)
    submitted by /u/gwern [link] [comments]
    Action Space Dimensional reduction for better convergence
    I am working on a project in which a robots learns its motion. for example Bipedal robot learns to walk on a straight line by learning to adjust the torque and angular velocity of each joint. However, the robot I am working on has complex architecture. It has 10 joints instead of 2, most importantly all of these joints work simultaneously and coherently to produce a desired motion. The Problem I am facing is that, the robot has ten joints and each joint can move between -450 to +450 for simplicity let me define State and Actions of the system State = -450 to +450 -------------> normalization ------------------> -10 to 10 Actions = choose an angle between -10 to 10 for each joint Total Action space for each State at each time step = 10 (no of joints each can move between -10 to 10 at each time step) * 360 (total time steps for a single motion) = 3600 (Output: No of angles required to generate a motion) I am using TD3 to solve this conundrum. The Action space is too large how can I reduce the action space? submitted by /u/SAM_Baloch [link] [comments]  ( 2 min )
    Dynamic action space in RL
    I am doing a project and there is a problem with dynamic action space A complete action space can be divided into four parts. In each state, the action to be selected is one of them For example, the total discrete action space length is 1000, which can be divided into four parts, [0:300], [301:500],[501:900],[901:1000] For state 1, action_ space is [0:300], State2, action_ space is [301:500], etc For this idea, I have several ideas at present: There is no restriction at all. The legal actions of all States are [1:1000], but it may take longer train time and there is not much innovation Soft constraint, for example, if state1 selects an illegal action, such as one action in [251: 500], reward gives a negative value, but it is also not innovative Hard constraint, use action space mask in each state, but I don't know how to do it.. Is there any relevant article? It is directly divided into four action spaces and uses multi-agent cooperative relationship learning ​ Any suggestions? thanks! submitted by /u/RangerWYR [link] [comments]  ( 2 min )
    Any paper suggestions??
    Hi everyone, i have to define a project for my master degree, so i'm looking for the best papers published since 2018-2019 until now in Reinforcement Learning . Do you have any suggestions, titles or projects that i can check? submitted by /u/acaviedes15 [link] [comments]  ( 1 min )
  • Open

    Responsible AI in a Global Context
    submitted by /u/john133435 [link] [comments]
    AI website that transitions photos into video?
    Remember using a website like a year ago, where you could put in 2 or more images, and it would sort of make a transition between the two with AI. Then you could export the video and such. You could also very extensively edit human faces and change small features on a scale from 1-100. The features where incredibly specific like brow bone and nasal bridge. If anyone has the website I would appreciate it!! submitted by /u/yungbenz0_bajs [link] [comments]  ( 1 min )
    How Artificial Intelligence Is Impacting Today’s Businesses
    submitted by /u/mr_j_b [link] [comments]
    Alibaba’s AI tool to improve efficiency of China’s waste-to-energy plants
    submitted by /u/mr_j_b [link] [comments]
    Best GAN for Tabular-data
    What in your opinion is the best GAN for tabular-data. Please include any references if you have any. submitted by /u/ily_jk [link] [comments]
    Supercharged UI for MLflow
    Hi guys, we've built a plugin that seamlessly reads MLflow logs and provides a beautiful UI to compare multiple runs with just a few clicks. You can: filter runs with a super versatile fully pythonic search group and aggregate your metrics / images We are trying make it work seamlessly with MLflow and complement its other awesome features 🎉 Here is more info about it https://aimstack.io/aimlflow Would love your feedback!! submitted by /u/ManeSa [link] [comments]  ( 1 min )
    Takeaways From 3 Years Working In Machine Learning
    submitted by /u/elcric_krej [link] [comments]
    OpenAI 's new model DALL·E 2 is amazing!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    The AI in a jar
    submitted by /u/bendee983 [link] [comments]
    Can Computers Learn Common Sense?
    submitted by /u/estasfuera [link] [comments]
    Metaverse weekly digest: Shiba Inu’s metaverse, Alibaba’s $60 million VR investment
    submitted by /u/bent_out_of_shape_ [link] [comments]
    Meet ‘ChestLink’, The First Autonomous AI Medical Imaging Application by ‘Oxipit’ That Received CE Mark Approval in the EU
    ​ https://preview.redd.it/e2q3jit3c8s81.png?width=1024&format=png&auto=webp&s=837aa6256647df6fb8777a02b04313a38428f573 The most common diagnostic imaging test conducted in emergency rooms is chest radiography. Providing automated preliminary read helpers to physicians might speed up surgery, enhance accuracy, and lower healthcare costs. An artificial intelligence tool that interprets chest X-rays without the intervention of a radiologist received regulatory approval in the European Union this week, marking a first for a wholly autonomous medical imaging AI, according to ‘Oxipit‘, the developer of this tool. It’s a watershed moment for AI, and it’s more than likely to spark debate, given that radiologists have spent the last few years working to fully automate parts of their jobs. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    Large-Scale Matrix Factorization on TPUs
    Posted by Harsh Mehta, Software Engineer, Google Research Matrix factorization is one of the oldest, yet still widely used, techniques for learning how to recommend items such as songs or movies from user ratings. In its basic form, it approximates a large, sparse (i.e., mostly empty) matrix of user-item interactions with a product of two smaller, denser matrices representing learned item and user features. These dense matrices, in turn, can be used to recommend items to a user with which they haven't interacted before. Despite its algorithmic simplicity, matrix factorization can still achieve competitive performance in recommender benchmarks. Alternating least squares (ALS), and especially its implicit variation, is a fundamental algorithm to learn the parameters of matrix factorization…  ( 9 min )
  • Open

    Rock On: Scientists Use AI to Improve Sequestering Carbon Underground
    A team of scientists have created a new AI-based tool to help lock up greenhouse gases like CO2 in porous rock formations faster and more precisely than ever before. Carbon capture technology, also referred to as carbon sequestration, is a climate change mitigation method that redirects CO2 emitted from power plants back underground. While doing Read article > The post Rock On: Scientists Use AI to Improve Sequestering Carbon Underground appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Build a custom entity recognizer for PDF documents using Amazon Comprehend
    In many industries, it’s critical to extract custom entities from documents in a timely manner. This can be challenging. Insurance claims, for example, often contain dozens of important attributes (such as dates, names, locations, and reports) sprinkled across lengthy and dense documents. Manually scanning and extracting such information can be error-prone and time-consuming. Rule-based software […]  ( 7 min )
    Getting started with the Amazon Kendra Box connector
    Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. For many organizations, Box Content Cloud is a core part of their content storage and lifecycle management […]  ( 6 min )
  • Open

    Hilbert transform and Mathematica
    The Hilbert transform of a function f(t) is a function fH(x) defined [1] by The integral must be interpreted in the sense of the Cauchy principal value: The integrand is not absolutely integrable because of the singularity at x and so the value of the integral depends on how you handle the singularity. The Cauchy […] Hilbert transform and Mathematica first appeared on John D. Cook.  ( 2 min )
    Visual integration
    The plot below is of a meromorphic function f(z). That is, the function f(z) is analytic except possibly at poles, and the colors represent the phase angles, the values of θ if you write the function values in polar form. What is the value of the integral where C is the perimeter of the square? […] Visual integration first appeared on John D. Cook.  ( 2 min )
  • Open

    Dense Passage Retriever(DPR) Open-QA System
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
  • Open

    Artificial Intelligence: Benefits for Automation Testing
    AI has been making a lot of noise of late, especially in the context of software development. Of course, this topic is quite wide, but in this article, we shall focus our attention on AI-driven automation testing. Let us start with understanding what is AI and automation testing. Automation testing refers to the process of… Read More »Artificial Intelligence: Benefits for Automation Testing The post Artificial Intelligence: Benefits for Automation Testing appeared first on Data Science Central.  ( 3 min )

  • Open

    Attend the 2022 National Autonomous Vehicle Expo (April 16-17th)
    Interested in the future of autonomous vehicles? Want to know more about the impacts of this technology? Join us on April 16-17th at the 2022 National Autonomous Vehicle Expo to discover the engineering, ethics, and policymaking of this emerging technology. The virtual expo consists of speaker and workshop sessions led by industry-leading companies, such as NVIDIA, Waymo, and Motional, as well as distinguished programs/organizations like MIT Beaverworks and InspiritAI. You will also have the opportunity to compete in our hackathon, where you can win a variety of cool prizes! Even if you don't participate in the hackathon, there will be free merchandise and giveaways throughout the expo! To register and/or view more information about the event, head over to avexpo.org. For hackathon-specific registration, you can visit our devpost at https://autonomous-vehicle-expo.devpost.com/. Hope to see you all there! ​ https://preview.redd.it/qgfx3sv837s81.png?width=1080&format=png&auto=webp&s=ed19d68bdff274de188deaa8f4338c864943b508 https://preview.redd.it/a4kbsrv837s81.png?width=1080&format=png&auto=webp&s=6a19347b708d822d8dca226beb82cbdc73ffbb87 https://preview.redd.it/s9nghsv837s81.png?width=1080&format=png&auto=webp&s=cf32712b9985564b455be53e02fd00589725ad2c submitted by /u/avexpo22 [link] [comments]  ( 1 min )
    Resources about cognitive theories
    Hi! I am new to the community, and was wondering what y'all's favorite resources were to learn about cognitive theories and how they will shape future AI advancements. YouTube channels would be great. submitted by /u/Apprehensive-Candy97 [link] [comments]
    Andrew Yang & Yuval Noah Harari: Tech, Public Policy & the Future of Work
    submitted by /u/john133435 [link] [comments]
    AI News | AI News | Why AI Made 40,000 New Chemical Weapons Compounds in 6 Hours | Cancer Treatment AI Breakthrough
    submitted by /u/getrich_or_diemining [link] [comments]
    How to create a BOT for a existing game?
    I wanna create a bot for a game which basically is: get resources, craft itens, sell then. The problem is, some itens has different qualities, and I wanna automatize this process, to identify the good stuff to keep, and sell the bad stuff. What's the best way to do that? I work with desktop systems, so i'm not familiar with this kind of stuff, but I usually read about python and some frameworks, what do you guys recommend me to start? submitted by /u/AbbathDoom [link] [comments]  ( 1 min )
    DALL·E 2: A new AI system to create realistic images and art from natural language commands
    submitted by /u/alien128 [link] [comments]
    What are other "technology" fields that is good to learn while studying AI?
    Hello! What do you guys think are other "technology" fields that would be good to study with AI? It is okay as long as it is "tech." What would be the tech field that would be beneficial in the future? My goal is to make a self-aware AI (AGI). I was always fascinated about AI since my childhood, that's why I'm going to pursue this field. Also, I am currently studying Game Development to make a VR Game that hopefully will have humanlike AI in it. I have read a LOT of articles about the future of AI, and Cybersecurity keeps popping up because superintelligent AI needs to be CONTROLLED from hackers (based on the articles) otherwise it is over. What do you guys think would be the tech field that will bring the most changes in the future? submitted by /u/ThatOneEpicAstronaut [link] [comments]  ( 1 min )
    Does this artificial intelligence think like a human?
    submitted by /u/qptbook [link] [comments]
    Artificial intelligence Courses for Healthcare
    We keep on hearing about how artificial intelligence and machine learning is going to revolutionise Medicine. But what’s hype, and what’s realistic? And how can you get involved? The first step is to understand the technology - where it’s well-suited to healthcare (and where it isn’t). When it comes to health care, especially for life and death situations AI has made things very easy for us. However, it is still expected to drastically change the way medicine is practised. It will also replace the surgeries done by the doctors with the surgeries done using Artificial intelligence, making diagnosing complex diseases, genetic issues and many other health problems extremely easy in the future. Here are the best Artificial Intelligence courses for healthcare you can learn in 2022. submitted by /u/maneesh123456 [link] [comments]  ( 1 min )
    Artificial Nightmares: Smithing Stone 6 || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Introducing MindSpore 1.6 New Features
    submitted by /u/Creative_Habit_6868 [link] [comments]
    OpenAI's DALL·E 2 ! Text-to-Image Generation Explained
    submitted by /u/OnlyProggingForFun [link] [comments]
    How do I get into the field of AI policy and strategy?
    I've read online that a career in AI policy and strategy is heavily needed and is actually ranked as the number one problem in the future by 80,000 hours. I am choosing which undergraduate degree to pursue in the fall and I'm not sure the best pathway to pursue to work in this field in an extremely high level position. an economics degree? Computer science degree? AI degree? should I pursue one subject until I get a PhD in it or mix with other degrees/certificates? is it a straight forward pathway focused on one subject where I only work in one subject field or is it necessary to pursue and work in other fields as well, what are the typical steps? Also if there is anything else that would be helpful on the pathway or anything you would recommend please let me know. submitted by /u/Key-Lawyer-7586 [link] [comments]  ( 2 min )
    Five Google Chrome Extensions that every Machine Learning / Data Science professional should know about 🚀💯
    submitted by /u/MLtinkerer [link] [comments]  ( 1 min )
  • Open

    Implementation of RL
    Hi all! I am a beginner in RL field and am trying to implement the RL algorithm in the following paper : [1912.04321] Learning to Code: Coded Caching via Deep Reinforcement Learning (arxiv.org) In short, we are trying to achieve the minimum number of transmission of bits from the server to all users Now, after 500 episodes of training the number of transmissions does decrease. But when I implement the same actor critic algorithm this does not happen. In fact, the results seem to be completely random. Here is the plot of the same : https://preview.redd.it/yve3cay7l6s81.png?width=745&format=png&auto=webp&s=36bf4f0aa813a30024128d797acde3b8adc2df30 Although my training parameters are slightly different, I can't understand why this would happen. I used the parameters and pseudo code from this paper: A Deep Reinforcement Learning Approach for Shared Caching | IEEE Conference Publication | IEEE Xplore - which is an extension of the link at the top. ​ Attaching link to my code: https://www.kaggle.com/samarthtiwari123/rl-for-coded-caching Any help would be helpful!! Thanks in advance submitted by /u/samt_123 [link] [comments]  ( 1 min )
    How can I extract the direction a specific agent is facing?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Flow of gradients through multiple classes
    Naive question: if I define my RL model as a combination of different classes (one class that preprocesses the observation, one class that processes the observation, one class that outputs the actions, etc.), is this going to affect the flow of gradients in PyTorch? The alternative would be to create only one class in which I combine everything submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Can KL divergence be used as a metric to see the learning progress in PPO?
    In the hyper parameter section of the paper, it is written that step size of Adam is varied according to KL divergence. So I wanted to know is KL divergence the correct metric to be used for observing the learning progress because we have many states for which probabilites of a particular action is either increased or decreased thus taking average KL mixes up a lot of things. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Weight decay in policy network for Discrete SAC?
    We’re finding that our network is returning a tensor of NaNs towards the end of training. Adding weight decay solves this issue but reduces learning, was wondering if anyone else had experience with vanishing gradients in off-policy methods or any insight? submitted by /u/TerrificJam [link] [comments]  ( 1 min )
  • Open

    [D] self attention visualization
    Has anyone ever come across seemingly chaotic self attention maps during visualization. If your model is performing well but no insights can be gleaned from the visualization how do you explain it in a paper? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] PaLM's (Google's 530B LLM) training costs around $9M to $17M.
    Here's the blogpost estimating the cost. What would it cost you to train PaLM using cloud computing (and you're not Google)? Something around $9M to $17M. submitted by /u/cirqe [link] [comments]  ( 1 min )
    [D] Feature selection methods
    I'm working on a ML project and I'm working with a dataset with 20 columns, for feature selection I just removed one column one by one and looked at the error of the ML outputs for each, then saw when what column is removed gives a lower error and kept repeating that but that didn't seem to help the model at all and the error went down very little. Is this an okay way of doing feature selection is there another way that gives better results. I tried PCA and LDA and Pearson Correlation method as well in Python and that didn't seem to help or is this the best I could do. Thanks! submitted by /u/ihshosv [link] [comments]  ( 1 min )
    [D] Overfitting a sign high learning capacity?
    This is a two part question: If a neural network can overfit the a large dataset is this a sign that a neural network has high learning capacity? If a neural network can overfit a dataset with substantially less parameters than other neural networks developed for the same learning task is this a sign that the neural network has a high learning capacity relative to other datasets? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [D] training cnn with synthetic data. Should i mix synth and real and train from the scratch or pretrain the network with synth and finetune with real?
    I'm doing research on the use of synthetic data for a computer vision task and i have generally always tried to train in a mixed setting from scratch, but i have noticed that in similar papers, researchers always pretrain on synth first and then finetune on real data. Is there a logic behind that? Should i expect better results by finetuning? submitted by /u/TheManveru [link] [comments]  ( 2 min )
    [R] Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
    Hi Everyone, We have recently published the code for our AISTATS 2022 paper - Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data ​ Video Segmentation Example In our work, we have proposed a solution for clustering streaming data. Unlike 'standard' clustering scenarios, in the streaming case the data stream is possibly infinite, you cannot backtrack to previously processed points, and the data statistics are dynamic and change over time. Our solution is based on the Dirichlet Process Mixture Model (DPMM), can work with different types of observations, and is very fast, outperforming other methods both in the quality of the results and the speed with which it achieves them. It can even be distributed across several processes and/or machines! Paper: https://dinarior.github.io/papers/Dinari_AISTATS_streaming.pdf Code (Julia Package): https://github.com/BGU-CS-VIL/DPMMSubClustersStreaming.jl Code (Python wrapper): https://github.com/BGU-CS-VIL/dpmmpythonStreaming Notebook (Julia) for creating the video: https://nbviewer.org/github/BGU-CS-VIL/DPMMSubClustersStreaming.jl/blob/main/examples/VideoSeg.ipynb submitted by /u/dinarior [link] [comments]  ( 1 min )
    [D] Best way to handle encoding disconnected graphs at the graph level.
    I am thinking of building a graph classifier that takes in graphs and labels the incoming graph. The dataset of interest to me is RadGraph: https://arxiv.org/abs/2106.14463 The issue I am having is that the graphs in RadGraph are disconnected in nature (on average 20 disconnected components), making it difficult for the various graph encoders I am aware of to do a good job classifying the graphs. submitted by /u/AICoderGamer [link] [comments]  ( 1 min )
    [D] Does someone know how much faster deepspeed's transformer implementation is?
    Implementation here Looks like they manually calculate the gradient? I'm very curious how much of a difference this makes! submitted by /u/fasttosmile [link] [comments]  ( 1 min )
    [R] My research group is publicly sharing its paper presentations! Check it out!
    https://outsystems-ai-reading-group.github.io/ submitted by /u/JClub [link] [comments]
    [R] On-the-fly Strategy Adaptation for ad-hoc Agent Coordination
    submitted by /u/hardmaru [link] [comments]
    [D] Any good free to use DALL-E style datasets?
    Are there any free to use datasets that contain image/annotation pairs in the style OpenAI used to train the DALL-E models? Pretty inspired by DALL-E 2 and think it would be cool to create a tiny less powerful replication submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [D] TensorFlow tf.range() vs range()
    TLDR: TensorFlow AutoGraph unwraps native Python ranges, baking each value into the graph. This can be an unexpected cause of graph size explosion. This recently caused an issue in my project, so I thought I'd share some more details: https://lukewood.xyz/blog/to-unroll-or-to-not-unroll submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [R] A benchmarking framework for time-series unsupervised domain adaptation
    Our work "AdaTime: A Systematic Evaluation of Domain Adaptation Algorithms on Time Series Data" is now public. We provide a benchmarking framework named "AdaTime" to fairly evaluate Unsupervised domain adaptation (UDA) approaches on time-series data. We find that UDA approaches proposed for visual data can be directly applied to time-series data, and still achieve excellent performance, even better than methods specially proposed for time-series UDA. Se were impressed by the consistently superior performance of "DIRT-T" method on all the datasets. We provide the code publicly on github https://github.com/emadeldeen24/AdaTime submitted by /u/emad_eldeen [link] [comments]  ( 1 min )
    [P][R] Announcing: Dataset & Denoising Shabby Pages Competition
    Into machine learning? Want a chance to earn a new MacBook Pro? Check out the Denoising ShabbyPages competition! The ShabbyPages dataset is being produced as a way to help train, test, and calibrate computer vision machine learning algorithms designed for working with documents. Enter the competition by training a model to remove the noise, and be awarded a MacBook Pro or some swag in the process! Check out the short paper introducing the dataset, and learn more about the competition at denoising-shabby.com. submitted by /u/proofconstruct [link] [comments]  ( 1 min )
    [R] FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes
    Hi Everyone! We have released the preprint and google colab demo for our paper FaceSigns. FaceSigns embeds a secret bit-string as a semi-fragile watermark in the image pixels. The message is recoverable if benign image operations such as color/contrast adjustment, JPEG compression, Instagram filters are applied. However, the message cannot be decoded if the image is facially tampered (eg. DeepFake manipulation) . This selective fragility allows reliable detection of DeepFake manipulations applied on images signed using FaceSigns. Try out our google colab demo to see message encoding and decoding using FaceSigns! Paper: https://arxiv.org/abs/2204.01960 Project Webpage: https://shehzeen.github.io/facesigns Demo: https://github.com/paarthneekhara/FaceSignsDemo submitted by /u/LynxCompetitive7637 [link] [comments]  ( 1 min )
    [D] Machine learning models / ideas for Google search ads?
    Hi guys! I work in house and I’m part of our Google search team. Our ad spend is pretty large (9 figures per year, in USD). We build/manage stuff at scale using SQL, R, Javascript, and so on. So everything is pretty much “big data” in flavour. Lately I’ve been more and more interested in data science, and I’m looking to take things to the next level by incorporating machine learning into our workflow. I’d really love to build some useful machine learning models using popular Python libraries such as Pandas, SciKit Learn, NumPy, TensorFlow, PyTorch, and so on. Any suggestions on cool, and most importantly useful machine learning models I could build? (By “useful”, I mean something that could help increase the profits.) I think some classification, predictive, or recommender models would be great to start with. Cheers! 😄 submitted by /u/TropicalBound111 [link] [comments]  ( 2 min )
  • Open

    VDTTS: Visually-Driven Text-To-Speech
    Posted by Tal Remez, Software Engineer, Google Research and Micheal Hassid, Software Engineer Intern, Google Research Recent years have seen a tremendous increase in the creation and serving of video content to users across the world in a variety of languages and over numerous platforms. The process of creating high quality content can include several stages from video capturing and captioning to video and audio editing. In some cases dialogue is re-recorded (referred to as dialog replacement, post-sync or dubbing) in a studio in order to achieve high quality and replace original audio that might have been recorded in noisy conditions. However, the dialog replacement process can be difficult and tedious because the newly recorded audio needs to be well synced with the video, requiring …  ( 7 min )
  • Open

    Data Observability: Cracking the Code
    ‍What is the shortest distance between two points? A straight line of course. What if there are multiple points? Then, it depends.  A job executed in response to a user action – refreshing a dashboard, aggregating data, building a report, developing an ML algorithm, performing analytics – all require multiple hops through the data ecosystem.… Read More »Data Observability: Cracking the Code The post Data Observability: Cracking the Code appeared first on Data Science Central.  ( 5 min )
  • Open

    NN from Scratch: #2 Initializing parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    Regular expressions and successive approximation
    Regular expressions can do a lot of tasks in practice that they cannot do in theory. That’s because a particular application of regular expressions comes with context and with error tolerance. For example, much has been said about how regular expressions cannot parse HTML. This is strictly true, but it says nothing about how well […] Regular expressions and successive approximation first appeared on John D. Cook.  ( 3 min )
  • Open

    Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW
    GeForce NOW is about bringing new experiences to gamers. This GFN Thursday introduces game demos to GeForce NOW. Members can now try out some of the hit games streaming on the service before purchasing the full PC version — including some finalists from the 2021 Epic MegaJam. Plus, look for six games ready to stream Read article > The post Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Question about Model Predictive Control (MPC) cost function
    To my understanding, the cost function is the error between predicted state value and real state value. So if I use a neural network as my dynamics model(unknown true dynamics), the MPC cost function is equivalent to NN’s loss function? submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
    Learning To Play "Settlers of Catan" With Deep RL - code and write-up
    submitted by /u/henrythepaw [link] [comments]  ( 1 min )
    Which environments do you use for benchmarking?
    Hey guys I'm curious which environments you use to benchmark your standard RL algorithms. I typically use some environments from the OpenAI Gym or the DM control suite but benchmarking all my implementations against all environments for multiple seeds would take forever. Are there some of their environments you particularly like for benchmarking? submitted by /u/NiconiusX [link] [comments]  ( 1 min )
    How does advantage estimation is done when episodes are of variable length in PPO?
    In the PPO paper it is stated that we have to collect trajectories of length T from N different workers. Suppose I am not using multiple workers then I have to collect episodes N times of fixed length T. But these episode lengths are variable i.e. some episodes end much before T and some much after T. So my question is how do we calculated advantage because according to the PPO paper, for generalized advantage estimate, we have to observe the reward of terminal state. ​ So how should I calculate GAE in this ? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    A silly question from a new beginner
    I could not find an answer to a question hanging around in my head for a while. Suppose we have some data, if we build up an MDP to capture actions + state dynamics. Would the optimal policy win state-of-art RL algorithms? Edit: If that is the case, why would the community bothers with learning algorithm since finding the model of dynamics is the key? ​ submitted by /u/musicinthedark [link] [comments]  ( 2 min )
    What does it look like the output of a reinforcement learning agent/algorithm in practice?
    Hello, I am relatively new in the area of machine learning/reinforcement learning. I have this basic question regarding practical implementations. I just want to know, what does it look like the output of a reinforcement learning agent/algorithm in practice? Is it like a 'look-up table' that will set the weights/parameters of the ML model based on the input data? Note that I am asking after the offline training of the agent. How to implement the trained agent in practice, like in an embedded system? Do you guys have references or clues to help me to clarify? BR submitted by /u/b0bzera [link] [comments]  ( 1 min )
    multi-discrete action space in SAC(Soft Actor-Critic)
    Hello! I am using SAC(Soft Actor-Critic) to complete a reinforcement learning task with only four steps, each action is from one of four different action spaces. These four action spaces are essentially the same, and they are all chemical compound. I just want the agent to take different types of compounds at each step. I have the following questions: Whether the different four steps can be trained? In fact, there is a paper that only has four steps in a reinforcement learning process, but his action space only has one discrete action space. Is there any article I can learn from? because I know that for the handle control of the game, there are usually multiple discrete action spaces, but each discrete space dimension of my task is larger, such as [800, 700, 500, 600] Thanks! submitted by /u/RangerWYR [link] [comments]  ( 1 min )
    Is multi bandit on policy or off policy?
    quick question submitted by /u/Asleep_Donut1382 [link] [comments]
    How wrong is it to use sampling at inference time ?
    At my company we use RL to solve our problem. The thing is : our problem is rather complex, and this is the core of our product (so clients rely on the results produced). In order to reach satisfying results despite an agent that doesn't learn very well, we use sampling at inference time : instead of taking the best trajectory according to the agent, we take X trajectories and keep only the one with the best reward. This seems completely fine at first (similar things are done in NLP for example, with beam search), but in our case the sampling size is huge : 1024. Usually when using beam search, we use maybe a beam size of 6. Maybe 10 if you have good hardware ? ​ Now, the agent seems to be learning : the mean return is slightly increasing over time, the entropies for the actions are steadily decreasing, etc... Now the goal of the ML team is to improve agent's learning to decrease the sampling size at inference time (because it's costly to run 1024 trajectories through the environment...). But whatever we try, the improvements are not reflected (we compare all our experiments with 1024 sampling in order to see what the customers will see). ​ IMO this is because our sampling size is way too huge, even a random agent can produce okay-ish results... Is my intuition the right one ? submitted by /u/dummy-gummy [link] [comments]  ( 2 min )
    Does a master's thesis/doctoral dissertation need to have implications down the line for it to be good?
    I'm in the last semester of my undergraduate degree; over the past couple of weeks, I've been trying to brainstorm ideas that I would like to pursue in my graduate research career. I'm interested in the emergence of language in multi-agent reinforcement environments but I can't see how this would be important down the line when there are large language models that are completely dominating language and communication. Should this stop me from pursuing this idea or should I let my interest in the idea take precedence? submitted by /u/clarky103 [link] [comments]  ( 1 min )
    How long would it take you to implement a MARL PPO agent with joint attention architecture?
    Out of curiosity, how long would it take to implement a paper like this one? https://arxiv.org/abs/2104.07750 It has PPO agents in MARL, all of them with multihead attention performed on the observation, in such a way that an attention map is created for each agent. This attention map has information about how strongly each agent is attending to various elements of the environment. With KL divergence, the agents are rewarded for minimizing the difference between their attention maps. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
  • Open

    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions at Doordash
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tunning. Please check out the new article for all the technical details and let us know your feedback on our approach. https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] ICML author response. What reviewers expect.
    Hi, we submitted to ICML for the first time. We got 4 reviews and 3 of them are mostly positive. Major comments by the reviewers include: more justification on the assumptions, discussion on choices of parameters, and experiments in more complex and different environments. We want to address all the major and minor comments as best as we can but given that the response is limited to one page we cannot explain everything in detail. I am not sure what is the acceptable norm here. Do reviewers expect the authors to conduct some experiments during the rebuttal and provide sample results or just explain what additional experiment we will conduct and how we will do it. Justification and reasoning should be in details or a brief explanation with an assurance to add a detailed discussion in the final version suffices. TIA submitted by /u/srvsinha186 [link] [comments]  ( 1 min )
    [D] Questions for a TPM for ML interview at Google
    Hey all, I have a technical program manager interview soon for an ML team at google and I want to know if anyone has any sample role-related questions I can gauge myself with. I have a strong data science & statistics background but that doesn't always translate to deep ML knowledge like an ML Engineer might have. Any resources or sample questions? I have not found adequate results from google regarding this team area specifically. submitted by /u/math_is_my_religion [link] [comments]  ( 1 min )
    [D] Anyone knows any high accuracy models on UCI adult dataset?
    Hi everyone. This is my first-time post here, and I hope I did not break any sub rules. Currently, I am doing some research with the UCI Adult dataset(https://archive.ics.uci.edu/ml/datasets/adult). This first step is to build a high-accuracy classifier model. Does anyone know any high accuracy model on this dataset (more than 90%)? I use many machine learning models like logistic regression and neural network. But no matter how complex the model is, I can only get an accuracy of about 85% on the test set. I tried to google but I found many others also have similar results of about 85%. Any posts or papers will be helpful! Thanks in advance for your help! submitted by /u/Akasakura888 [link] [comments]  ( 1 min )
    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tuning. Please check out the new article for all the technical details and let us know your feedback on our approach. ​ https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    Hey there, just a heads up we at The Gradient just published a new article discussing explainability - "This article uses the common backdrop of competitive games to explore the ways in which domain experts adapt to new technologies that lack explainability. I illustrate how interpretations vary based on user experience and model architecture, and how special care must be taken when adapting models to human-centric problems." Check it out here if you think it's interesting / worth discussing: Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [Project] Learning to Play "Settlers of Catan" With Deep RL - Writeup and Code
    Hi all, I just wanted to share a project I've been working on for the past year - using deep RL to learn to play the board game Settlers of Catan. I expect everyone is aware of the results that DeepMind/OpenAI have got recently on Go, DOTA 2, Starcraft 2 etc, but I was motivated to see how much progress could be made with existing RL techniques on a reasonably complex game - but with access to significantly less computational resources. Whilst I didn't end up with an agent that performs at a super-human level, there was clear learning progress and the results were quite interesting. I decided to do a full write-up of the project here, which I figured could be useful for anyone else who is interested in trying to apply DRL to a new, complicated environment. I also open-sourced all the code here for anyone interested. If anyone has any feedback or any questions at all that'd be great! submitted by /u/henrythepaw [link] [comments]  ( 3 min )
    [D] ICML rebuttals optional or semi-mandatory?
    Hi, We just submitted to ICML 2022 and got our reviews back. We were excited to see that 4/4 reviews were positive and acknowledged the contribution of the paper. However, there were some minor criticisms (e.g. didn't do good enough lit reviews, could use a few more experiments) across several reviews. I was wondering if it is ever acceptable to not submit a rebuttal? Can a rebuttal in this case actually hurt us by rocking the boat---or for ICML is the norm that you should always submit a rebuttal that addresses all the reviewers' criticisms. We were wondering what the norm is for ICML specifically? submitted by /u/optimistic313 [link] [comments]  ( 1 min )
    [R] Hierarchical Text-Conditional Image Generation with CLIP Latents. This is the paper for OpenAI's DALL-E 2
    Blog post. Paper (pdf file format). The paper is also linked to in the above blog post. Abstract Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. OpenAI's Sam Altman used DALL-E 2 to generate ~20 text prompt requests from Twitter users. The results are here, with individual result links and other samples in this comment from another Reddit user in a different post. Twitter thread about the paper (not from the paper authors). Sam Altman's blog post about DALL-E 2. Hopefully this summer, we’ll do a product launch and people will be able to use it for all sorts of things. submitted by /u/Wiskkey [link] [comments]  ( 3 min )
    Is the 'first boss attempt' phenomenon know to occur amongst NN playing games, or is this learning trajectory unique to human players?[D]
    I'm curious about wether this unusual learning trajectory observed in humans has also been observed in artificial neural nets. A well known phenomenon in the 'dark souls' video game series is that ones first attempt at a boss is often much better than subsequent attempts. Boss hp at time of death by attempt might go something like: 35%, 55%, 85%, 87%, 75%, 54%, , 60%, 43%, 27%, 38%, 12%, 0%. This sounds very anecdotal, but its know to the community of these games to be a real thing. See this thread for evidence. Have NN playing games been known to exhibit a similar pattern, with peak in success early on, followed by a step descent , then a slow gradual climb? Or is this a purely human phenomenon? My hypothesis as to why this happens is that over the course of the first couple attempts, the player learns a bunch of bad strategies which must be slowly unlearned, whereas on attempt one, the player has no defined strategies good or bad. submitted by /u/Greenface1998 [link] [comments]  ( 3 min )
    [R] Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning
    submitted by /u/papajan18 [link] [comments]
    [Project][P] Who invented Graph Neural Networks?
    Just a side project (only for me) in which I try to sum up some history of DL. Can't be 100% sure this is the first article in which they appear: Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80. Would appreciate any help. Thanks submitted by /u/Siddh__ [link] [comments]  ( 2 min )
    [R] GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes
    (not my paper) paper: https://arxiv.org/abs/2204.02112 abstract: "The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of a covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. We propose Gaussian processes Bayesian additive regression trees (GP-BART) as an extension of BART which assumes Gaussian process (GP) priors for the predictions of each terminal node among all trees. We illustrate our model on simulated and real data and compare its performance to traditional modelling approaches, outperforming them in many scenarios. An implementation of our method is available in the \textsf{R} package \texttt{rGPBART} available at: https://github.com/MateusMaiaDS/gpbart." submitted by /u/bikeskata [link] [comments]  ( 2 min )
    [D] Anyone know about any interesting recent improvements with SNNs?
    I’m currently writing a research paper for my MSc on neuromorphic sensing and spike neural networks and most good papers are from around 2015 and was looking for something more recent. Anyone here heard of any interesting upgrades in architecture or applications? Cheers! submitted by /u/GandhisLittleHelper [link] [comments]
    Noticing that profs focus on male student’s goals and female student’s capabilities, any weigh-in? [D]
    Hello, I’m currently a graduate student. I do different projects and for some I get to decide on what I want the scope to be. I do have to get the scope/ plan/ idea approved first. I pitch my ideas to profs who aren’t directly my profs and normally 5-6 other students will pitch ideas to the same group of profs at the same time….. I noticed that i get really different questions and feedback in comparison to my peers. I’m a female and my peers are male… I didn’t start out with this outlook but I’m starting to search for reasons why I often get questioned about my capabilities to preform a project ( which is normal enough but I get questioned to the point where explaining my approach isn’t enough and they ask me for examples of codes) and my peers definitely do not get asked about there capabilities, rather they tell them what they can do and they don’t get questioned. …………… really frustrating. submitted by /u/tyger-lily [link] [comments]  ( 4 min )
    [D] How to write a ML+Healthcare paper where the research was a framework with pre-trained models
    As a project in the course of my PhD, I had to create a prototype for a project. My PhD is application of machine learning in health care. The project definition and scope was faaaar too wide. However, I managed to create a working demo which encompasses some use cases of the project. At best, it can be called a framework, where I have put in different DL components and it works okay for those use cases only. Most of the components, I have used are pre-trained language models (maybe fine tuned them to my use case). However, there is no active training or learning involved. This is because I created this for a demo only. I also created a very small dataset and tested the framework over the dataset and the results were ok. However, my supervisor now wants me to write a paper, as he is confident, that the use case is rather unique and my working framework is a good first step. I believe, his aim is to get me started on the paper writing process, which I appreciate. However, I am not confident about it at all. My question is, can a 'framework' composed of pre-trained models with the end goal of solving a problem in health care is good enough? Are there precedents of any such paper? And if I trust my supervisor's instincts, are there any fancy ways to frame the framework so that it does not look so basic? submitted by /u/Complex_State9960 [link] [comments]  ( 2 min )
    [P] Building a knowledge based recommender system
    I am trying to build a knowledge based recommender system but do not have prior knowledge. We first take in user inputs such as occasion, weather, top wear and bottom wear, color. Based on this we want to create a knowledge base and recommend clothes. Can anyone help me on how to go about on doing this process step by step and what algorithms and technology to be used? submitted by /u/bills70 [link] [comments]  ( 1 min )
    [D] ICML 2022 Paper Reviews
    ICML 2022 paper reviews are supposed to be released soon. Creating a discussion thread for this year's reviews. submitted by /u/zy415 [link] [comments]  ( 2 min )
    [D] In general, should you let the model find interactions between many basic features, or should you use feature engineering to ‘help’ the model find the interaction?
    I’ll give an example to better explain my question (don’t get hung up on the numbers, it’s all made up). Say you are using a tree based model trying to project how many points a player will score in a given basketball game. Most players shoot free throws at a slightly lower percentage on the road, than they do at home. However, the magnitude varies player to player. Let’s assume for 95% of players with significant data, the ratio of home free throw percentage to away is 1 to 1.15. Generally speaking, older players are closer to 1 and younger players are around 1.1 (since older players get used to the opposing crowd). Now also say it takes 100 home and 100 away free throws to get a stable reliable ratio. Now say a young player only has 50 home, and 50 away free throws. With this amount of data he has a ratio of 1, however the sample size is not enough to be fully stable. Which would be better… 5 features into this model, his home away ratio, average ratio for players his age, home free throw count, and away free throw attempts. 1 feature. His ‘projected’ home away ratio, which is a weighted average of his ratio with the average for plaeyrs his age. Since he’s 50% of the way to significance, 0.5 * 1 + 0.5 * 1.1 = 1.05 The benefit of the of the first choice is that it may find other interactions that I never conceived of, however, it could incorporate noise. Is there a general consensus, or is this just a try both and see what works? submitted by /u/irndk10 [link] [comments]  ( 4 min )
  • Open

    Quick Little Keras Question
    I tried posting this on stackoverflow with no response. Im trying to use model.save() and keras.models.load_model() on this chunk of code. But, unlike some of the other keras examples I've played with, this one seems to crash. I'm super new to this, any Ideas why? I can post the error message if it helps. submitted by /u/HoneyBunchsOGoats [link] [comments]  ( 1 min )
    Driving a robot with a neural network - use case study
    submitted by /u/KamilBugnoKrk [link] [comments]
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
  • Open

    Weekly China AI News: Slime Robot Grabs Swallowed Objects; SenseTime Revenue Grows Despite $2.7B Net Loss; Transformer Architecture Search Without Training
    submitted by /u/trcytony [link] [comments]
    Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    submitted by /u/regalalgorithm [link] [comments]
    How do we know that A.I hasn't already taken over our worlds ? How do we know this isn't the matrix ? #simulation
    submitted by /u/Individual-Fly-610 [link] [comments]  ( 2 min )
    DALL·E 2
    submitted by /u/roblox22y [link] [comments]
    Learn how GANs work with a cool Toonify example!
    submitted by /u/OnlyProggingForFun [link] [comments]
    Artificial Intelligence, Machine Learning and the Higgs boson - Live talk with Dr. David Rousseau
    submitted by /u/aair_x [link] [comments]
    What are your thoughts about AI teachers?
    submitted by /u/curiosityVeil [link] [comments]  ( 1 min )
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Beauty Parlor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives
    Meet the electric vehicle that’s quick-witted and fully outfitted. Last week, NIO began deliveries of its highly anticipated ET7 fully electric vehicle, in Hefei, China. The full-size luxury sedan is the first production vehicle built on the NIO Adam supercomputer, powered by four NVIDIA DRIVE Orin systems-on-a-chip (SoCs). The production launch of its flagship sedan Read article > The post Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives appeared first on NVIDIA Blog.  ( 2 min )
    NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests
    In its debut in the industry MLPerf benchmarks, NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge. Overall, NVIDIA with its partners continued to show the highest performance and broadest ecosystem for running all machine-learning workloads and scenarios Read article > The post NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Receive notifications for image analysis with Amazon Rekognition Custom Labels and analyze predictions
    Amazon Rekognition Custom Labels is a fully managed computer vision service that allows developers to build custom models to classify and identify objects in images that are specific and unique to your business. Rekognition Custom Labels doesn’t require you to have any prior computer vision expertise. You can get started by simply uploading tens of […]  ( 7 min )
  • Open

    An optimized solution for face recognition
    When artificial intelligence is tasked with visually identifying objects and faces, it assigns specific components of its network to face recognition — just like the human brain.  ( 5 min )
    Does this artificial intelligence think like a human?
    A new technique compares the reasoning of a machine-learning model to that of a human, so the user can see patterns in the model’s behavior.  ( 7 min )
  • Open

    DSC Weekly Digest: Moving Time
    For nine years, my family and I have lived in a house in Issaquah, a little community about twenty minutes east of Seattle. The town still retains its charms — a downtown area about three blocks long that includes a vintage (and long since decommissioned) gas station, numerous restaurants, a live theater, the library, and… Read More »DSC Weekly Digest: Moving Time The post DSC Weekly Digest: Moving Time appeared first on Data Science Central.  ( 7 min )
  • Open

    Efficiently Initializing Reinforcement Learning With Prior Policies
    Posted by Ikechukwu Uchendu, AI Resident and Ted Xiao, Software Engineer, Robotics at Google Reinforcement learning (RL) can be used to train a policy to perform a task via trial and error, but a major challenge in RL is learning policies from scratch in environments with hard exploration challenges. For example, consider the setting depicted in the door-binary-v0 environment from the adroit manipulation suite, where an RL agent must control a hand in 3D space to open a door placed in front of it. An RL agent must control a hand in 3D space to open a door placed in front of it. The agent receives a reward signal only when the door is completely open. Since the agent receives no intermediary rewards, it cannot measure how close it is to completing the task, and so must explore …  ( 8 min )
  • Open

    DALL·E 2
    DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.  ( 2 min )
  • Open

    Sum the zeros of an analytic function without finding them first
    A couple days ago I wrote about how Vieta’s formulas let you sum the zeros of a polynomial without having to first compute the zeros. This is especially handy for high-order polynomials since there is no explicit formula for the zeros. Most functions that arise in applications are not polynomials. How could you find the […] Sum the zeros of an analytic function without finding them first first appeared on John D. Cook.  ( 3 min )

  • Open

    Last Week in AI: AI improves algae for biofuel and carbon capture, more AI decision-making in the military, and more!
    submitted by /u/regalalgorithm [link] [comments]
    Researchers From Allen Institute for AI Introduce ‘MERLOT Reserve’: A Novel Multimodal Video Question Answering Model
    We humans navigate the environment using all of our senses. Allen Institute researchers propose MERLOT Reserve, a model that learns to represent videos over time and across several modalities, including audio, subtitles, and video frames. It was trained using a new learning objective and more than 20 million YouTube videos. MERLOT Reserve is a unique, cutting-edge methodology for solving video-related inquiries. MERLOT Reserve can dependably choose the correct answer from a selection of multiple-choice answers when given a video and a question. This forecast is made by MERLOT Reserve jointly reasoning over the visual frames of the video, the video subtitles, and the audio in the movie. Continue reading this cool research update from AI2 Paper: https://arxiv.org/pdf/2201.02639.pdf Demo: https://merlot-reserve.apps.allenai.org/ Project: https://rowanzellers.com/merlotreserve/ Github: https://github.com/rowanz/merlot\_reserve ​ https://preview.redd.it/031i6ty6err81.png?width=1920&format=png&auto=webp&s=299569e12160eb991f35a2c6b41c5758ff027235 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    EndlessVN open alpha today
    submitted by /u/roblox22y [link] [comments]
    AI Meets Quantum Technology in New Google Spinoff, Sandbox AQ - News
    submitted by /u/allaboutcircuits [link] [comments]
    Best undergraduate major besides computer science for pursuing a career in artificial intelligence?
    Hi, all. I got accepted into my top choice of college as an undecided major. Recently, I have decided to pursue artificial intelligence! Unfortunately, it is near impossible to transfer into computer science at my particular university. I was wondering if I can still pursue AI as a career if I complete one of the following majors: -Mathematics -Information or Data Science -Statistics -Linguistics Additionally, I could pursue one of these and minor in another. I should be able to minor in computer science as well if necessary. Hopefully, my choice of major would allow me to pursue research or an internship in artificial intelligence. I am willing to take additional summer courses and pursue relevant certifications to ensure that I am up to par with my computer science colleagues. (posted on behalf of a family member) submitted by /u/runelagoon [link] [comments]  ( 2 min )
    Comparing old and new AI voices from Replica Studios (new in second half)
    submitted by /u/autumns [link] [comments]
    Super-res model/program comparison
    I upscaled an image with a few different superres models and programs, pick your favorite! https://files.botbox.dev/superrestestcollage.png Because of how reddit is, I can't make this as a poll, so comment your pick. Animated original version: https://www.youtube.com/watch?v=zRaTwVuqd70 (I will also make an animated version upscaled with the most voted model/program) submitted by /u/Recent_Coffee_2551 [link] [comments]
    solo voiceovers
    I am looking for something to change my voice in a way that is more satisfactory and more convincingly varied than what simple voice modulation software can achieve and as cheaply as is possible (preferably free). Use case: I have been working on an animated movie to which I am the sole contributor. Though I have been putting it off while looking for an appropriate solution, the time has come to voice my various characters, who are a range of ages, both male and female. For several reasons, I am interested in voicing them all myself while doing the facial motion captures as well. What I am in need of is, essentially, something that does exactly what Respeecher does, but without the $200/month sub fee. I would love to be in a position to simply pay them what they are asking for in exchange…  ( 2 min )
    Artificial Nightmares: Frenzied Flame || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    AI that takes multiple songs as input, and then generates a similar song or song with similar elements?
    I have been searching for a music AI that takes input as mp3 or midi files, yet haven't been successful yet. Is there such a thing? If not, is such a thing feasible? submitted by /u/16pxl [link] [comments]  ( 1 min )
  • Open

    [D] Why aren't new LLMs using the Perceiver architecture?
    Perceiver and PerceiverIO (https://arxiv.org/abs/2107.14795) appear to offer significantly improved FLOP efficiency, but new LLMs (including Deepmind's own Gopher) don't use it. What gives? Is it still too new, or is the Perceiver architecture not appropriate for LLMs? submitted by /u/deeceeo [link] [comments]
    [R] Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021 (Schmidhuber YouTube Talk)
    Saw this posted on Schmidhuber's Twitter: Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. URL of talk: https://youtu.be/2GgGVdkq2bU Abstract The most widely used machine learning algorithms were designed by humans and thus are hindered by our cognitive biases and limitations. Can we also construct meta-learning algorithms that can learn better learning algorithms so that our self-improving AIs have no limits other than those inherited from computability and physics? This question has been a main driver of my research since I wrote a thesis on it in 1987. In the past decade, it has become a driver of many other people's research as well. Here I summarize our work starting in 1994 on meta-reinforcement learning with self-modifying policies in a single lifelong trial, and - since 2003 - mathematically optimal meta-learning through the self-referential Gödel Machine. This talk was previously presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. Many additional publications on meta-learning can be found at https://people.idsia.ch/~juergen/metalearning.html submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] Hyperparameter Tuning: does it even work?
    Hi *, I've been working for the last 5 years as Data Scientist. During this time I have tried dozens of times to improve my models via hyperparameter tuning, but I've never got improvements from there. I've tried all the possible approaches: grid search, random search, bayesian search, etc. But in no case did I get satisfactory results. Does this happen to anyone else? Have you ever got robust improvements via HP tuning? submitted by /u/AM_DS [link] [comments]  ( 1 min )
    [D] Autoregressive model for graph generation?
    Autoregressive models like GPT-2 do fairly well in text generation. Is it possible to do the same for graph data? A transformer based model Graphormer has recently shown its effectiveness in graph representation learning. Is there any way I can train Graphormer or any other model to generate graphs from an initial graph context? submitted by /u/ratt_m [link] [comments]
    [D] How do you guys hear about the latest papers?
    Hi! I'm a first-year Grad Student in Computer Vision and I am trying to get caught up on the latest research in my field. It seems like everyone in CS has heard about all of the latest papers but I just have no idea how. My knowledge is limited to general ideas and doesn't know any specific papers unless they have like 20000+ citations. So my question is: how do you hear about these papers and get caught up? Is there a reference somewhere that puts together a list of all the "must-read" papers that have come out? I feel like I am already 5 years behind in my knowledge. It would be great if there was something like "Top 5 papers of the week" that I could read to stay on top of things. Also, this doesn't just apply to Vision. I would like to have an idea of the other major developments in other fields (like NLP, general ML/DL, etc.) since I think that can carry over to my field. Thanks! Looking forward to your replies submitted by /u/TobusFire [link] [comments]  ( 2 min )
    [D] Fake authors and paper riders
    Based on my experiences in both academia and industry, I see that many researchers get listed as authors on papers solely for having attended the relevant project meetings, despite not contributing anything substantial to the work. I know of several people who've gotten on dozens of papers this way, despite not being able to explain the main details behind many of the papers they "co-authored." Of course, they can then claim credit for the work publicly as well as have their academic profile benefit from the citations accrued by the work. I've noticed that typically, these people are initially invited onto the project because they are on chummy terms with someone on the project. Concerningly, the more someone successfully "paper-rides" this way, the stronger their publication record looks, which makes it easier for them to find their way onto more projects to paper ride in the future. It seems that the obsessive focus on paper counts and citations has encouraged the rise of intellectually dishonest strategies for maximizing one's academic footprint. The huge research scientist salaries at top industry labs, which similarly obsess over paper counts and citations in their hiring process, only amplifies the incentive for paper riding. The reason I think it is bad: As more people paper ride, co-authorship on a paper gradually becomes a worse indication of expertise. Not to mention, paper riders are intellectually dishonest, by claiming credit for research that they didn't significantly contribute to. In a sense, it seems like a roundabout form of plagiarism. I know some might disagree with this take, as some people believe in being as generous about co-authorship as possible. I find that mindset to create the perfect environment for paper riders to flourish. I'm wondering if you've also seen paper riding happen and whether you think this behavior is good or bad. submitted by /u/alwayshumming [link] [comments]  ( 7 min )
    [D][R] Generate random sample for exponentiated Weibull distribution
    Hi there experts, I have a real distribution for which I had run this scipy script to detect the best fit: However, the script outputs 4 parameter values and the best fit is actually a Exponentiated Weibull distribution. Now I am clueless how to generate a sample list of data of n-size. I know for sure about the normal distribution after getting these params as mean and sigma. How to I generate such list. Please help. ​ ​ https://preview.redd.it/79n28icmsqr81.png?width=1141&format=png&auto=webp&s=d9478691c06f5cdfe03af4f82db8293443e91f1e submitted by /u/GoldenDew9 [link] [comments]  ( 1 min )
    [R][D] VAE Embedding Space - Can we force it to learn a metric?
    I understand that certain AE types such as B-VAE disentangle certain aspects of variation in the data, and those such as Conditional AE or VAE allow us to separate these aspects with labels. However, what I have seen is that the embedding space doesn't cluster the images as well as some contrastive methods. However contrastive methods require non-elegant negative sampling etc. Can we somehow force the VAE to learn both the variational lower bound as well as learn a good metric between samples such as visually similar samples are better clustered together? submitted by /u/jim_from_truckistan [link] [comments]  ( 1 min )
    [D] Jetson AGX Orin dev kit as a stand-alone training platform
    The Jetson Orin 64gb model has "275 Sparse|138 Dense INT8 TOPS", and I am a little confused about how to compare this to something like the RTX a6000's performance. I am looking to do deep rl training and am new to the field. What metrics make a difference for deep rl? Any thoughts on the Orin dev kit's ability to train deep rl? submitted by /u/here_to_create [link] [comments]  ( 1 min )
    [D] With the rise of AutoML, what are the important skills for a ML career?
    Some time down the road, when AutoML becomes more established, it can help us determine the best ML model and hyperparameters for a particular problem. This will not replace data scientist, as we still need data scientists for their domain knowledge, which is critical for scoping business problems, pre-processing data, and deriving business insights from the trained model. However, since data scientists no longer need to deal with the technicalities of a model in the near future (i.e. they no longer have to tune hyperparameters, determine the best opitmistion function etc), is there still a need for aspiring data scientists to learn about the intricacies and nuances behind the various models (maybe by coding the model from scratch)? Or is it enough for them to learn how to operate an AutoML system? (My question is referring to the corporate world in general and not to academia) Thanks in advance for your answers :) submitted by /u/smart_oinker [link] [comments]  ( 7 min )
    [P] AutoML-Conf Competition: DAC4AutoML
    Hi everyone! We've just launched a competition at the AutoML-Conf 2022, the DAC4AutoML competition. It has two tracks, one for configuring a Computer Vision model and one for a RL pipeline: https://automl.github.io/dac4automlcomp/ And what is DAC exactly? It means we want to find well-performing hyperparameter configurations like in Algorithm Configuration, but we do it dynamically - thus DAC, Dynamic Algorithm Configuration. As to how that is supposed to happen? We don't put any restrictions on the solutions for the competitions, so you can submit your hand-tuned static hyperparameter setting if you want. Or you can use some sort of heuristic, a regression model, reinforcement learning, ... whatever works. If you're interested in participating, you can submit from now on until the 18.06. AOE, the winners will be announced at the AutoML-Conf. submitted by /u/catsortion [link] [comments]  ( 1 min )
    [D] Imagenet Original Pictures
    As I understood it Imagenet got generated from internet images, but I am unable to to find the originals using naive image search. Is there any mapping? I wonder if imagenet data is a cropped versions of original pictures or not, i don't see it in the paper. submitted by /u/LeanderKu [link] [comments]  ( 1 min )
    [P] UFO Lands on Highway! Or Depth Estimation using ML
    Article describing depth estimation using machine learning models and 3D visualization of depth maps using three.js. https://www.storminthecastle.com/posts/ufos_and_depth/ submitted by /u/CakeStandard3577 [link] [comments]
    [D] Could Stylegan-XL be great for out-of-domain generation?
    In the context of text-to-image generation, I'd say one of the reasons VQGAN is so used in popular notebooks is that it can deal with many concepts, while stylegan used to be limited to the domain it was trained for. That may be about to change with the rollout release of Stylegan-XL weights trained on Imagenet. This notebook (https://github.com/CasualGANPapers/StyleGANXL-CLIP) has had nice results with objects never seen by the model, such as "apple" and "ant", as well as scenes such as "judo athletes fighting" Please note that the Stylegan-XL weights are currently available for 128x128 pixels. ETA for the 256 resolution is 14.04.22 submitted by /u/HrodRuck [link] [comments]  ( 1 min )
    [Discussion] Support Vector Machines... in 2022
    My post is inspired by this discussion. In that thread, OP asked why support vector machines are still taught. People offered several thoughts: they're easier to think about, they're still perfectly good for some real-world problems, and for some problems they apparently rival deep networks. I did a project for a class around six years ago using an SVM as implemented in scikit-learn. I was pretty satisfied with the project, but I also experienced some frustrations, and came away with some questions. I started working with Tensorflow and DNNs in earnest soon after finishing that project, and I largely stopped thinking about SVM. I would like to revive the questions I asked, but never answered, here. A DNN with multiple outputs can potentially use a single neuron in the prediction of more than one output. For multiple, mutually-exclusive categories, this makes good sense. An SVM with multiple outputs in scikit-learn was implemented as pairs of one-vs-one SVMs, each of which was independently fit to data. This gets inefficient quickly. Has this changed? Can it be changed? DNN training at scale is a problem that many people have worked hard to make practical. Even non-experts like myself use our home GPUs to accelerate training of DNNs on large data sets. In scikit-learn, SVM training was implemented in a single thread on one CPU core. If you are performing cross-validation or a hyperparameter optimization study, it might be practical to parallelize fitting; one thread for each distinct condition. But can you parallelize the SVM fitting algorithm for a single condition? I went looking for software, but I couldn't find anything. Over to you folks. Cheers. submitted by /u/aotus_trivirgatus [link] [comments]  ( 6 min )
    [R] Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022--ORAL) + Colab Demo + Gradio Web Demo
    ​ Visual Results With Restormer, you can remove noise, motion blur, defocus blur, and rain streaks from your own images. Paper: https://arxiv.org/abs/2111.09881 Github: https://github.com/swz30/Restormer Colab Demo: https://colab.research.google.com/drive/1C2818h7KnjNv4R1sabe14_AYL7lWhmu6?usp=sharing Gradio Web Demo: https://huggingface.co/spaces/swzamir/Restormer submitted by /u/swz30 [link] [comments]  ( 1 min )
    [R] [D] Seq2seq model hyperparameters tuning
    Does anyone have any advices or research papers on what hyperparameters do researchers use for their seq2seq model? I am interested in knowing whether hyperparameters such as dropout, or recurrent dropout, batchnorm, etc etc, are even necessary in the usage of seq2seq model, but couldn’t find anything on it for weeks. In the case, let’s say, using gridsearchCV, what hyperparameters do you tweak for ur seq2seq model? (Other than the usual stuff like number of neurons, etc). There is absolutely zero information for that on seq2seq model, and everyone just assumes that putting an attention mechanism solves everything without hyperparameters tunings. I have also looked up on codes on seq2seq, and no hyperparameters tunings were shown whatsoever. FYI, this is in the context of time series data, using seq2seq, if that matters. Thanks submitted by /u/plsendfast [link] [comments]  ( 1 min )
    [D] Has anyone seen any papers related to GANs which prove that the optimum remains unchanged when adding supervised loss (e.g. L1, L2)?
    I’ve been reading many papers lately pertaining to GANs, with more and more introducing supervised loss into the generator’s objective function. However, no one ever seems to show that the optimum remains undisturbed. Results seem to be strictly empirical most of the time. Has anyone seen any papers where it is shown that the disruption to the generator’s loss doesn’t harm convergence? submitted by /u/king_of_walrus [link] [comments]  ( 1 min )
    [D] What is your experience with Fake results or overfitted results being sold as awesome?
    I am curious what is everyones experience with completely faked, falsified, or fabricated results in the area? Another aspect of this I think is people taking heavily overfitted results and finding one decent example that is from the test set and claiming their method is awesome. How much of this have you seen and how much of the research out there fits into this category? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
  • Open

    Win tickets to The AI Summit London 2022
    Sponsored Post Join the UK’s most forward-thinking technologists and business professionals this June in a celebration of emerging technology. Machine […] The post Win tickets to The AI Summit London 2022 appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    MIT has trained AI to generate new molecular materials
    submitted by /u/aidev2040 [link] [comments]
  • Open

    Building a Data Products-centric Business Model
    When I was the Vice President of Advertiser Analytics at Yahoo!, I painfully learned that my targeted user personas (Media Planners & Buyers and Campaign Managers) didn’t want more data in helping them optimize their marketing, campaign, and advertising spend across the Yahoo! Ad Network.  Heck, they didn’t even want analytics!  The aspirations for these… Read More »Building a Data Products-centric Business Model The post Building a Data Products-centric Business Model appeared first on Data Science Central.  ( 5 min )
    Data Discovery for ML Engineers
    Real-world production ML systems consist of two main components: data and code. Data is clearly the leader, and rapidly taking center stage. Data defines the quality of almost any ML-based product, more so than code or any other aspect. In Feature Store as a Foundation for Machine Learning, we have discussed how feature stores are… Read More »Data Discovery for ML Engineers The post Data Discovery for ML Engineers appeared first on Data Science Central.  ( 9 min )
    Content Metrics That Can Help You To Write Dissertations
    Many things can help you to write good dissertations. One of the most important is to use content metrics. It is necessary for all of the students to understand content metrics in detail. A clear understanding of its types and measuring strategies help you to evaluate things in a precise way. Whatever is your topic… Read More »Content Metrics That Can Help You To Write Dissertations The post Content Metrics That Can Help You To Write Dissertations appeared first on Data Science Central.  ( 5 min )
    Comparative analysis of an Intel and AMD Processor
    The need of a highly functional and fast processing Central Processing Unit (CPU) in today’s world is not just mostly desired, but also mostly required due to the rapid digitalization across the globe. Whether you work on a personal computer (PC) unit or laptop, the necessity of a highly advanced processor is indispensable.  This is… Read More »Comparative analysis of an Intel and AMD Processor The post Comparative analysis of an Intel and AMD Processor appeared first on Data Science Central.  ( 3 min )
    AI And Its Impact On Diversity And Inclusion
    How does artificial intelligence Diversity, Equity, and Inclusion (DEI) fit into the technological stack of daily companies? Fostering a diverse workforce is a very human problem. The cry for a halt to race prejudice has become deafening, and it’s increasingly a decisive factor for talent when weighing job offers and purchases. To stay up with the… Read More »AI And Its Impact On Diversity And Inclusion The post AI And Its Impact On Diversity And Inclusion appeared first on Data Science Central.  ( 4 min )
    How Data Intelligence Platforms Promote Business Success
    Understanding consumer behavior is becoming more and more critical as businesses seek to find innovative ways to survive and thrive in a period of constant change. In the last few years, the market has seen significant changes in the way people shop, travel, dine and purchase goods. As a business, when it comes to understanding… Read More »How Data Intelligence Platforms Promote Business Success The post How Data Intelligence Platforms Promote Business Success appeared first on Data Science Central.  ( 4 min )
    Exploring AI labeling for children’s products
    I read an article from the world economic forum which proposed an AI labeling system for AI products designed for children Today, for the first time, children are growing up in a world shaped by artificial intelligence (AI) and decisions are being made for children implicitly by AI.  Algorithms need data that is collected and… Read More »Exploring AI labeling for children’s products The post Exploring AI labeling for children’s products appeared first on Data Science Central.  ( 3 min )
  • Open

    Need project suggestions
    I’ve been running circles in tutorial purgatory and I want to get out of it with sone projects. Anyone has any suggestions? Guided ones would be nice. For unguided ones, could you please provide source links/hints? submitted by /u/HellVollhart [link] [comments]  ( 1 min )
    Agents learns policy when sampling last episode from replay buffer, but don't when randomly sampling from the replay buffer
    Hi all. I've been stuck on this problem for a while and I thought I might be able to find some help here. Any kind of assistance would be greatly appreciated. My setup is as follows. I have an environment with 3 agents. All 3 agents have a single policy network, and it is based on CommNet. My goal is to implement a replay buffer for this environment. I verified that my replay buffer logic is good. I tried running 3 different types of runs: Normal on-policy run: The agents perform an episode, and at the end of each episode the data (such as the states, actions, etc) from this episode are used to calculate the loss Using just the last episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, the last episode is sampled from the replay buffer (which is the episode that was just performed). This is just to confirm that my replay buffer is working properly, and the reward curve for this case matches that from (1). Using 1 random episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, a random episode is sampled from the replay buffer and used to calculate the loss. The performance is terrible in this case, and the environment times out each time For some reason, as soon as I turn on random sampling, progress is really bad. I'm sorry to pose such an open-ended question, but what are some things I could check to pinpoint the source of this problem? What could be a reason as to why performance is as expected when just sampling the last episode, whereas it is terrible when randomly sampling episodes? I've tried some things thus far but nothing has worked, and I turned to this community in hopes of getting some help. I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 3 min )
    PPO sample correlation?
    Hi, I'm wondering if the PPO algorithm can solve the sample correlation problem of on-policy algorithm in training. PPO uses successive samples to compute GAE, doesn't the sample correlation occurring here interfere with learning? submitted by /u/noisemastar [link] [comments]
    [P] AutoML-Conf Competition: DAC4AutoML
    submitted by /u/catsortion [link] [comments]  ( 1 min )
    Temporal Difference Learning for Model Predictive Control
    submitted by /u/bendee983 [link] [comments]
    Why is there no rollout monitoring for this CustomEnv (on the right) ?
    ​ Output from using model.learn(env) on both Envs On the left I have a simple dummy CustomEnv (Using Stable-Baselines3 with Gym) for testing, and on the right I have my actual CustomEnv that I am working on in a project. As you can see, the dummy environment gives me the rollout monitoring, whereas there is no rollout monitoring for the actual environment (just time + train statistics/monitoring). I am using very similar code when setting up the training of the model, however the complexity of the actual model is significantly higher than the dummy. In theory, the complexity of the environment shouldnt make a big difference to the monitoring right? All of the key parts are still there (reward function, step function, reset function etc.). In both cases it says that the environments are being wrapped by the 'Moniter' wrapper so that cant be it. Does anyone know why this might be happening? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Value Iteration in Car Racing V1
    I’m working on Q table learning model for OpenAI’s. I have everything done in regards to a basic agent, but I’m unsure how I’m supposed to use the box data for action space and observance space, to populate a q table? Or is this approach incorrect? Car Racing doesn’t have a P (probability) call so I’m not sure how else I would do value iteration. submitted by /u/Dzartovian94 [link] [comments]  ( 1 min )
    Action Spaces all landing to zero probability in few steps
    Hey guys, I am new to RL and walking through the Keras implementation of Actor Critic. ​ As a variant of it, I am trying to learn the strategy for WORDLE. However, after a few runs, my action spaces all go down to zero. Not sure what's happening. Could someone have any insights or pointers? ​ Attaching my code for reference. ​ Thanks import pandas as pd import numpy as np import random import string import random import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # Configuration parameters for the whole setup gamma = 0.9 # Discount factor for past rewards max_runs = 10000 eps = np.finfo(np.float32).eps.item() # Smallest number such that 1.0 + eps != 1.0 my_file = open("", "r") content = my_file.read() content = lis…  ( 3 min )
    Any RL-related conferences right after NeurIPS 22’?
    In case my NeurIPS submission rejected, lol. submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
  • Open

    Robots dress humans without the full picture
    MIT researchers design a robot that has a trick or two up its sleeve.  ( 6 min )
  • Open

    Reproducibility in Deep Learning and Smooth Activations
    Posted by Gil Shamir and Dong Lin, Research Software Engineers, Google Research Ever queried a recommender system and found that the same search only a few moments later or on a different device yields very different results? This is not uncommon and can be frustrating if a person is looking for something specific. As a designer of such a system, it is also not uncommon for the metrics measured to change from design and testing to deployment, bringing into question the utility of the experimental testing phase. Some level of such irreproducibility can be expected as the world changes and new models are deployed. However, this also happens regularly as requests hit duplicates of the same model or models are being refreshed. Lack of replicability, where researchers are unable to reproduce…  ( 9 min )
  • Open

    Customize the Amazon SageMaker XGBoost algorithm container
    The built-in Amazon SageMaker XGBoost algorithm provides a managed container to run the popular XGBoost machine learning (ML) framework, with added convenience of supporting advanced training or inference features like distributed training, dataset sharding for large-scale datasets, A/B model testing, or multi-model inference endpoints. You can also extend this powerful algorithm to accommodate different requirements. […]  ( 5 min )
    Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger
    Research over the past few years has shown that machine learning (ML) models are vulnerable to adversarial inputs, where an adversary can craft inputs to strategically alter the model’s output (in image classification, speech recognition, or fraud detection). For example, imagine you have deployed a model that identifies your employees based on images of their […]  ( 12 min )
  • Open

    Unreal Engine and NVIDIA: From One Generation to the Next
    Square/Enix presents the fictional city of Midgar in Final Fantasy VII Remake at a filmic level of detail. Epic’s Fortnite bathes its environments in ray-traced sunlight, simulating how light bounces in the real world. And artists at Lucasfilm revolutionized virtual production techniques in The Mandalorian, using synchronized NVIDIA RTX GPUs to drive pixels on LED Read article > The post Unreal Engine and NVIDIA: From One Generation to the Next appeared first on NVIDIA Blog.  ( 4 min )
    Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year
    A dozen companies today received NVIDIA’s highest award for partners, recognizing their impact on AI education and adoption across such industries as education, federal, healthcare and technology. The winners of the 2021 NPN Americas Partner of the Year Awards have created a profound impact on AI by helping customers meet the demands of recommender systems, Read article > The post Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Bounding zeros of an analytic function
    The previous post looked at the problem of finding the zeros of a cubic polynomial. Assuming we’re going to use a numerical method to calculate the zero, the hard part is knowing where to tell the numerical method to look. That post showed how to use a change of variables to guarantee that the polynomial […] Bounding zeros of an analytic function first appeared on John D. Cook.  ( 2 min )
    Numerically finding roots of a cubic
    The analog of the quadratic formula for cubic equations is cumbersome. A lot of people naturally say “Forget all that. If I need to find the roots of a cubic, I’ll just use a numerical method like Newton’s method.” Sounds good. Where to start? But how do you know where to look for the roots? […] Numerically finding roots of a cubic first appeared on John D. Cook.  ( 3 min )

  • Open

    Mathematics and piano tuning
    The following is a slightly edited version of a Twitter thread on @AlgebraFact. The lowest C on a piano is called C1 in scientific pitch notation. The C one octave up is C2 and so forth. Middle C is C4. The frequency of Cn is approximately 2n+4 Hz. This would be exact if C0 were […] Mathematics and piano tuning first appeared on John D. Cook.  ( 2 min )
    Computing functions of roots without computing roots
    Once in a while it’s necessary to calculate some function of the roots of a polynomial, and it may be possible to do this without first calculating the roots. Quadratics The quadratic formula gives explicit solutions to the equation The two solutions for x are where The awkward part is taking the square root of […] Computing functions of roots without computing roots first appeared on John D. Cook.  ( 3 min )
    FWHM for a quadratic
    This post contains a derives a result I needed recently. The derivation is simple but a little tedious, so I wanted to save it in case I need it again. Full width half maximum A common way to measure the width of a function peak in a function f(x) is to find the place x0 […] FWHM for a quadratic first appeared on John D. Cook.  ( 2 min )
    Number slang and numbered lists
    Here’s a list of five numbers used as slang in various contexts. Location (CB and police radio) End of column (journalism) Best wishes (ham radio) All aircraft in area (US Navy) I love you (text messages) The motivation for this post was an article Those HTML attributes you never use. I wanted to make a […] Number slang and numbered lists first appeared on John D. Cook.  ( 1 min )
  • Open

    PPO Alg confusion
    As I read the paper and several tutorials, I am quite confused about the details. I see many implementations scale the running cumulative discounted reward in a certain way. However, each of them does it in a different way. let R be the running cumulative discounted reward, which is considered the best? Or is there a source of the method to use? implementations I saw from different places include: directly use R to calculate the advantage and training value network (most PPO tutorials use this) use (R / std(R)), where std is the mini-batch standard deviation use (R / std(R)), where std is the running standard deviation use ((R - mean(R)) / std(R)), where both mean and std are mini-batch wise use ((R - mean(R)) / std(R)), where both mean and std are running stats. do the above and clip to a certain range ([-10, 10] or [-1, 1]) I also see several different ways for the value network, let V be the output of the value network: output raw logit, without any scaling/output activation (most PPO tutorials use this) output raw logit, but use the same scaling as discussed above for running cumulative discounted reward, for example, if return value is (R / std(R)), value output will be (V / std(R)) do the same as 2, but use the stat of V instead of R for scaling, for example, if the return value is (R / std(R)), the value output will be (V / std(V)) output with tanh activation at the last layer output with tanh activation at the last layer, and multiply by a constant to match the range of the return any help would be appreciated, thanks! submitted by /u/seermer [link] [comments]  ( 2 min )
    I’m completely new to RL and will be building my first model as part of my degree-ending project. Do you have any tips you can provide?
    Hello all, As the title describes, I’ll be making my first model as part of my final project. I still have a pretty high-level understanding of everything, so forgive any inaccuracies as I describe what I’m going for. The problem I’m attempting to solve is known as the traveling salesman problem. Essentially, a route needs to be formulated that stops at n given locations. Finding the most efficient route with many stops algorithmically is impractical because the number of possible routes increases exponentially with each added location. The environment will simulate travel on city roads. Speed will be a constant, set to whatever the roads speed limit is. I am using .pbf format vector GIS data from OSM so that the environment consists of real-world pathways. I’m using GeoPandas and Pyrosm to work with the data, and I’m collecting nodes for the location of gas stations so that the environment can simulate needing to fuel the vehicle. Gas price will be a constant, as well as vehicle fuel-efficiency. Scoring will be based on the calculated time it would take to complete a route and the calculated cost (in gas). The goal will be to find the most efficient route to take when n = some large number. I’ve never worked with spatial data either, so I’m not sure what kind of challenges that poses. I worry that adding nodes for the locations of gas stations might be difficult. I’m also wondering if I’m better off using Tensorflow and Keras for this, but I’m not really aware of all the technical considerations I should be making before deciding on that. Do you have any tips that might help me out? Solutions to problems I haven’t hit just yet, but likely will? Thanks for your help! submitted by /u/professorDissociate [link] [comments]  ( 2 min )
    Best implementations for extensibility?
    As far as I am aware, StableBaselines3 is the gold standard for reliable implementations of most popular / SOTA deep RL methods. However working with them in the past, I don't find them to be the most usable when looking for extensibility (making changes to the provided implementations) due to how the code base is structured in the behind the scenes (inheritance, lots of helper methods & utilities, etc.). For example, if I wish to change some portion of a method's training update with SB3 it would probably involve overloading a class method before initialization, making sure al the untouched portions of the original method are carried over, etc. Could anyone point me in the direction of any implementations that are more workable from the perspective of extensibility? Ideally implementations that are largely self contained to a single class / file, aren't heavily abstracted aware across multiple interfaces, don't rely heavily on utility functions, etc. submitted by /u/Farconion [link] [comments]  ( 1 min )
    Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation in RL?
    Given a NN class, is there something specific we need to care of when converting *args and **kwargs to a canonical kwarg representation? I ask this because in this code from Google (https://github.com/google-research/google-research/blob/c56b47713b08c95ad427d5f93ee0dbb9ad008964/social_rl/multiagent_tfagents/joint_attention/attention_networks.py#L557) they use a TFDecorator-aware replacement for inspect.getcallargs, instead of using getcallargs directly. So my questions are: - Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation? - If no, is there an equivalent in PyTorch? I couldn't find any, so I was wondering how people go about that. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    openAI gym return done==True but not seeing goal is reached
    Hi all, I am running some starter code from openAI(FetchReach-v1, FetchPush-v1) gym with env.action_space.sample(). But I don't see the goal is actually achieved when done returned is True. I copied the code from here (https://openai.com/blog/ingredients-for-robotics-research/). I even let it sleep every step to watch more closely. Another related thing that I can't explain is that it always returns done==True rather quickly with very few sampled actions. These all make me worried about using it as my task environment. submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Which elective: Monte Carlo Simulation or Computational Learning Theory?
    Hello /r/reinforcementlearning. I have to choose electives pretty soon, and as i am interested in reinforcement learning, I wanted to know which of these would be the most beneficial. Monte Carlo Simulation Computational Learning Theory The year after I will also take a course on Reinforcement Learning, but it has not been created yet. Note: I can also take both if recommended, if I do so, I will take one of the courses before taking the RL course, and the other would be at the same time. Some further thought I've had: CLT includes Bandits, which is surely were useful to know, but it seems to be only a rather small part, and I'm unsure whether all the other topics like PAC Learning and Rademacher Bounds are useful? MC is more practical while CLT is more theoretic (Apparently VERY theoretic according to the course description above). I am not afraid of theoretic courses, but I struggle more with them than more practical courses. The sentiment around the MC course, is that it is pretty good. I don't know anyone who have taken the CLT course. If I choose both, which order would you take them in? submitted by /u/John_Hitler [link] [comments]  ( 2 min )
    Ray RL lib observations normalized?
    Hey i am using the RL lib from ray and i don't know if the observations automatically normalized by the lib or not? By creating a costum environment ray wants you to create an observationspace. That would be a gym box in my case. Anyway idk the exact high and low values. My values lay between -1 and 1 more or less. My fear is now that ray would normalize the Observation values to a new range although they are already processed. Does ray normalized observationspace? If yes how can i turn it off? Thanks! submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    [CfP] EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
    How PPO deals with episodes of Variable lengths?
    In the paper it is written to collect trajectories of length T. Then calculate advantage and then train the Actor and Critic Network. My question is suppose one episode ends much before T. If I run that episode upto lenth T then it will only collect negative rewards in each timestep which in turn makes the training impossible as the return if very big negative number. So what can be done instead of this? I might be getting it wrong, so please correct me by commenting. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Need help with OpenAI gym custom environment, state representation as "observation"
    Hello, I'm making a custom openAI gym environment to train various algorithms on it. I have encountered some issues. My .flatten() method on the state class returns a large integer which can be converted back into the same state as the object. However when I try to do this as the returned observation for environment.reset() and environment.step(), when testing it I get: "AssertionError: The observation returned by the `reset()` method does not match the given observation space" which I can fix by having it just return a 0. How do I go about resolving this? and are there any better approaches for wanting to train RL agents on an environment? ty! submitted by /u/snaredrum_merchant [link] [comments]  ( 1 min )
  • Open

    [D] Why do we still teach support vector machines?
    Honest question: are there any applications for which SVMs are the best choice? In my experience, no one seems to use this methodology anymore, though maybe I'm wrong. It just kinda feels like teaching people how to use a slide rule when everyone has calculators. submitted by /u/WartimeHotTot [link] [comments]  ( 3 min )
    [D] Paper Explained - Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
    https://youtu.be/zEMOX3Di2Tc This paper finds what seems to be a new phenomenon when working in the continual learning/life-long learning domain. If new tasks are continually introduced to an agent, it seems to loose it's ability to learn the more time progresses. Intuitively it's similar to this idea that "an old dog can't learn new tricks". They propose a fairly simple method of overcoming this limitation that involves resetting weights that are not contributing much to the outcome of the network. They call the method Continual Backprop. ​ Outline: 0:00 - Overview 2:00 - Paper Intro 2:53 - Problems & Environments 8:11 - Plasticity Decay Experiments 11:45 - Continual Backprop Explained 15:54 - Continual Backprop Experiments 22:00 - Extra Interesting Experiments 25:34 - Summary ​ Paper link: https://arxiv.org/abs/2108.06325 submitted by /u/SlickBlueML [link] [comments]  ( 1 min )
    [R] Google's 540B (Dense) model Pathways LLM, "Unlocks" new tasks proportional to scale
    Blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Paper: https://goo.gle/palm-paper - AFAIK from the Blogpost, Scaling laws still hold up (i.e not yet plateaued) - New transfer learning capabilities, outperforms fine-tuned models with 50x less data (Codex-12B) - The interesting part is how it meta-learns techy geeky jokes and is able to correlate concepts, and explain jokes suggesting starting doing a bit more meta-learning than GPT3 ever could.... But still not enough to generate decent ones (though the joke wasn't particularly humorous, so I may be underestimating) SoTA on various tasks, chain-of-thought-reasoning still holds up to scaling and outperforms some reasoning benchmarks, BIG-bench sees a huge improvement and general LLM thingys :) submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 4 min )
    [R] Minimum Description Length Recurrent Neural Networks
    https://arxiv.org/abs/2111.00600 https://preview.redd.it/l6dni0007jr81.png?width=4888&format=png&auto=webp&s=82c7c9b9433b79c66318090ff85e4535c35ddb18 submitted by /u/inland-1 [link] [comments]  ( 1 min )
    [R] CfP EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
    [P] Looking for a dataset
    Hey! New Here. I logged back into Reddit after years just to ask this question on this forum. I need to test a model, based loosely on BERT, that classifies a piece of text as having right or left political ideology leaning and whether it promotes any racial or religious stereotypes. For training purpose we used SBIC, IBC, and Stereoset. Though these only contain short sentences which are labeled as belonging to only one of the above categories. Is anyone aware of any other Dataset which can be used for this purpose, which hopefully contains text labeled as promoting or containing a political leaning (left/right, conservative/liberal, neutral) and further either any racial or religious stereotypes? Very thankful in adv submitted by /u/Fee_Imaginary [link] [comments]  ( 1 min )
    [P] Random Relational Graph Convolutional Networks (RR-GCN)
    📑 The Random R-GCN code has just been released! 📝 With just a few lines of code, you can now create embeddings of entities in a Knowledge Graph. ​ Minimal example on how to create embeddings with RR-GCN ​ 💡 RR-GCN does not require training and is competitive to fully trained R-GCNs. 👉 https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]
    [R] DiffusionCLIP: Text-Guided Diffusion Models for "Robust" Image Manipulation (CVPR 2022)
    submitted by /u/ImBradleyKim [link] [comments]  ( 1 min )
    [P] Transformers for Software Engineers
    submitted by /u/hardmaru [link] [comments]
  • Open

    Logging in Python
    Logging is a way to store information about your script and track events that occur. When writing any complex script […] The post Logging in Python appeared first on Machine Learning Mastery.  ( 22 min )
  • Open

    Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow
    As more organizations move to machine learning (ML) to drive deeper insights, two key stumbling blocks they run into are labeling and lifecycle management. Labeling is the identification of data and adding labels to provide context so an ML model can learn from it. Labels might indicate a phrase in an audio file, a car […]  ( 7 min )
    Enable Amazon Kendra search for a scanned or image-based text document
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra supports a variety of document formats, […]  ( 5 min )
    Interpret caller input using grammar slot types in Amazon Lex
    Customer service calls require customer agents to have the customer’s account information to process the caller’s request. For example, to provide a status on an insurance claim, the support agent needs policy holder information such as the policy ID and claim number. Such information is often collected in the interactive voice response (IVR) flow at […]  ( 6 min )
  • Open

    School of Engineering welcomes Thomas Tull as visiting innovation scholar
    Primary focus will be to advance and promote technology, innovation, and entrepreneurship across the school.  ( 4 min )
  • Open

    Microsoft Researchers Introduce ‘Jigsaw’: An AI Tool To Augment Large Language Models (GPT-3, Codex, etc.) By Deploying Post-Processing Techniques That Understand The Programs’ Syntax And Semantics
    GPT-3, Codex, and other sizable pre-trained language models can be adjusted to create code from natural language descriptions of programmer intent. Every developer in the world might benefit from these automated models, which have the potential to increase productivity. However, because the models may fail to understand program semantics, the quality of the generated code cannot be guaranteed. Microsoft researchers introduce Jigsaw, a new tool that can help these big language models perform better. Jigsaw is a Python Pandas API code generator that accepts multi-modal inputs. Jigsaw uses post-processing techniques to decipher the syntax and semantics of programs and then uses user feedback to improve future performance. Continue Reading Paper: https://arxiv.org/pdf/2112.02969.pdf Dataset: https://github.com/microsoft/JigsawDataset ​ https://i.redd.it/x223r5qu0kr81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    submitted by /u/nick7566 [link] [comments]
    Generative AI+Alex Grey = xxxxxoooooooo (Disco Diffusion)
    submitted by /u/JoshGrambo [link] [comments]
    UiPath extract Tables from PDF (use case) (PDF table)
    submitted by /u/Cristi_UiPath [link] [comments]
    New RL technique achieves superior performance in control tasks
    submitted by /u/bendee983 [link] [comments]
    Metrics: Matthew's correlation coefficient
    submitted by /u/TheTesseractAcademy [link] [comments]
    12 Graphs That Explain the State of AI in 2022
    submitted by /u/Tao_Dragon [link] [comments]
    What is WhatsApp Business API? How can it Help your Business?
    submitted by /u/mihircontra20 [link] [comments]
    Voice copying/cloning
    Hi all, Don't know if this is the right subreddit, but here goes.... I'm looking to voice clone my father. He has passed recently, and despite being difficult for all, it's been especially hard for my mother, married early to him and together for 50 years. Her birthday is coming up, I'd love to be able to create a 5-10 second sound byte of him for her. Fortunately, there's likely to be lots of his voice recording around, part of his job was speaking and instructing. So, is there any way this is possible, to be done without great difficulty, and produce an accurate result? I am understanding the moralities of crafting something with his deceased voice. I thought about it quite a bit. However, I feel that it's for his soulmate who's struggling, who he had no qualms spending his life with and travelling abroad with, spent his last days with. I'm certain he would want to help. submitted by /u/mininggotboring [link] [comments]  ( 1 min )
  • Open

    Data Augmented Healthcare Part 2
    In the previous part, we discussed the current state of data imaging tools in healthcare and the future applications of these technologies. While increased access to information is invaluable to physicians, they can still be limited by their own ability to interpret, or the physical limitations of their surgical ability. In addition to augmenting the… Read More »Data Augmented Healthcare Part 2 The post Data Augmented Healthcare Part 2 appeared first on Data Science Central.  ( 3 min )
  • Open

    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of …  ( 9 min )
  • Open

    Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse
    Pekka Varis’s artistry has come a long way from his early days as a self-styled “punk activist” who spray painted during the “old school days of hip hop in Finland.” The post Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Composing Music with Neural Networks
    Hey guys, ​ I really love creating music algorithmically, which is why I have dedicated my master’s thesis to the generation of music patterns by the use of artificial intelligence. In the course of the past 12 months, I have programmed a deep recurrent neural network in Python, which I have trained on 200 self-made music patterns in order to generate somehow novel motifs. ​ In order to evaluate my model, I have set up a short online listening experiment. I’m looking for test subjects right now, so if you are interested in participating, I would really appreciate it. The listening experiment will take you just about 5 to 8 minutes to complete and the only thing you need is a pair of headphones. You can partake on your computer as well as on your smartphone or tablet. ​ Here is the link which gets you to the listening experiment: https://forms.gle/rx1FUQ7RgpjMu1xx9 ​ Thank you very much for taking the time to help me reach my goal. Really appreciate it. submitted by /u/JosephdeLaquinta [link] [comments]  ( 1 min )

  • Open

    [D] Unconventional computer vision problems that are intrinsically different from classifying ordinary stuff
    Given that most benchmarks for image classification are based on regular, everyday world objects RGB images (or grayscale), what are some unconventional science cases where 2D inputs are substantially different from what we are used to perceive by eye? For example, I'm interested in cases where spatial information can't be constrained to narrow pixel value ranges, such as exponential signals. Or that any standard normalisation (say min-max, zscore) and normalisation layers are not applicable and could lead to the loss of information. One of these cases is Astronomy. However, most practitioners try to to adapt the problem to established standards (say fake RGB images, log scaling flux images, etc). What are other cases out there where the nature of the 2D inputs are very distinct to what we are used to parse through our eyes and what deep nets are benchmarked on? I'm curious about tailored solutions that would intrinsically change the way the deep nets are constructed to solve the research question. submitted by /u/astroferreira [link] [comments]  ( 1 min )
    [Research][Project][Library] Dog-feeding a new Machine Learning data tool
    Hi everyone! I'm Atindriyo Sanyal, one of the founders of the ML company Galileo (https://rungalileo.io/). We're building a cool new tool/framework for ML practitioners that helps shine a light on the data you are training your models with. I'd love to get some feedback on the product, and since we're still in private beta, I'm looking for folks to try out the product on their datasets and models. It's easy to use and hooks into popular frameworks such as pyTorch, Tensorflow, Keras, SpaCy etc. Caveat: Currently the tool only works for NLP use cases (think text classification, NER etc). I'll be giving $100 to folks who are willing to give some time to this and provide feedback on the usability of the product. If you're interested, here's a really tiny form (should take <1 minute to fill) for you to fill out. I'll review the applications and send you an email for a follow up Zoom chat where I'll share the software artifacts with you! https://docs.google.com/forms/d/11V20C_J_SyNaX7QL6DasnTe7f0UiueUyaKdmt3xL1oI/edit Look forward and happy (machine) learning! - Atindriyo P.S. If you have any questions or want to chat personally, send me an email at [atin@rungalileo.io](mailto:atin@rungalileo.io). submitted by /u/atindriyo_galileo [link] [comments]  ( 1 min )
    [D] Pain points when using GPU instance platforms
    Hi everyone, I just launched a GPU compute instance platform (think lambdalabs, fluidstack, aws EC2, vast), and I was wondering what pain points everyone has with existing solutions. I'm not trying to sell anyone anything, but I want to look for feedback that will help me to build a better product. My current thoughts are Ease of getting data into the platform Ease of getting data off of the platform Automation for spinning up and down instances Availability of the type of instance you want Price too high Not enough/too many abstractions TIA and I look forward to some good discussions! submitted by /u/runpod-io [link] [comments]  ( 1 min )
    [R] Efficient-VDVAE: An open-source memory-efficient and stable very deep hierarchical VAE
    Hello everyone :) We have released last week our paper "Efficient-VDVAE: Less is more" with code! We present simple modifications to the Very Deep VAE to make it converge up to 2.6x times faster and save up to 20x times memory load. We also introduce a gradient smoothing technique to improve stability during training. Our model achieves comparable or better negative log-likelihood (NLL) on 7 commonly used datasets. Additionally, we make an argument against existing 5-bit benchmarks. We empirically show as well that 3% of the latent space is enough to encode the data information without any performance loss. Thus, indicating the potential to efficiently leverage the Hierarchical VAE's latent space in downstream tasks. Paper: https://arxiv.org/abs/2203.13751 Code: https://github.com/Rayhane-mamah/Efficient-VDVAE Paperswithcode: https://paperswithcode.com/paper/efficient-vdvae-less-is-more Feedback is very much appreciated! https://preview.redd.it/tjua1xpq3cr81.png?width=878&format=png&auto=webp&s=718bd91fd648acd673ddab1ad5342207e8be09e7 submitted by /u/Louay-AI [link] [comments]  ( 1 min )
    [R] DeepDPM: Deep Clustering With an Unknown Number of Clusters
    Hey everyone :) We've just released the code for our paper (accepted to CVPR2022) DeepDPM is a nonparametric deep-clustering method which unlike most deep clustering methods, does not require knowing the number of clusters, K; rather, it infers it as a part of the overall learning. Using a split/merge framework to change the clusters number adaptively and a novel loss, our proposed method outperforms existing (both classical and deep) nonparametric methods. While the few existing deep nonparametric methods lack scalability, we show ours by being the first such method that reports its performance on ImageNet. ​ Paper: https://arxiv.org/abs/2203.14309 Code: https://github.com/BGU-CS-VIL/DeepDPM/ Below are some examples of clusters our method found in ImageNet. https://preview.redd.it/jw5kvcuzfbr81.jpg?width=737&format=pjpg&auto=webp&s=5b61cdd0efdea7c92aba611171e5dc7f4276c892 submitted by /u/shahaff32 [link] [comments]  ( 4 min )
    [D] Why are confidence regions elliptic?
    Confidence regions are the 2D version of a confidence interval. Almost everywhere in the literature, the shape is elliptic, but no justification is provided. You would think that a confidence region of level γ is defined as the domain of minimum area, covering a mass γ of the underlying probability distribution. That sounds perfectly logical, but it is mentioned nowhere. Based on this definition, the boundary of a confidence region is obtained by solving an optimization problem: it is a problem in calculus of variations -- finding a boundary curve encompassing a domain of minimum area. These problems are usually hard to solve, but in this case, the solution seems trivial: it must be a contour line. And if the underlying distribution is Gaussian, contour lines are obviously ellipses. This would be a solid justification as to why ellipses are so widespread. https://preview.redd.it/42mr1t1je8r81.png?width=1072&format=png&auto=webp&s=2fb9cedbbb15895827ed00edc4912ac39fad0b71 My question here is whether or not my argumentation makes sense, or if there is something faulty in my math. I discuss it in more details in one of my articles, here. If you need clarifications, please reply on Reddit, I will do my best to explain. submitted by /u/MLRecipes [link] [comments]  ( 2 min )
    [D]New Scaling Laws for Large Language Models
    https://www.lesswrong.com/posts/midXmMb2Xg37F2Kgn/new-scaling-laws-for-large-language-models submitted by /u/Singularian2501 [link] [comments]  ( 1 min )
  • Open

    Blog: Let’s manually approximate a simple function with a ReLU neural network
    submitted by /u/rhkibria [link] [comments]
  • Open

    Top Ways in Which AI Impacts Grocery Retail
    Artificial intelligence has been long making waves globally, empowering companies from across the broad spectrum of industries to take their businesses to the next level. So it is no surprise that this technology is making inroads in the grocery retail space, helping grocers deliver personalized and irreproachable experiences across different channels, establishing improved customer loyalty,… Read More »Top Ways in Which AI Impacts Grocery Retail The post Top Ways in Which AI Impacts Grocery Retail appeared first on Data Science Central.  ( 3 min )
    A different take on business intelligence
    Data is useless if it doesn’t shed light. The more light it sheds on the most acute problems businesses face, the better. Within this context, data synergy–data from multiple sources and disciplines that is more valuable than the sum of its parts–is often underappreciated. With data synergy, the light can be in many more places,… Read More »A different take on business intelligence The post A different take on business intelligence appeared first on Data Science Central.  ( 4 min )
    Blockchain Won’t Save The Metaverse
    Blockchain is widely touted as a mechanism for securing digital property. Multiple problems exist for driving metaverse transactions. A new review highlights the challenges, some of which may be insurmountable. Blockchain has been touted as a potential solution to securing users’ digital content and data due to its decentralization, immutability, and transparency. However, there are… Read More »Blockchain Won’t Save The Metaverse The post Blockchain Won’t Save The Metaverse appeared first on Data Science Central.  ( 4 min )
  • Open

    How does the ACER algorithm work?
    I am currently writing a report on reinforcement learning, where I am trying to describe how the ACER algorithm works. I have read the arxiv paper on the sample actor-critic with experienced replay, but I don't understand where the experience replay comes in. Is this part of the policy gradient? where the policy is updated every episode it's trained on from the previous knowledge it gathers in previous episodes. https://arxiv.org/pdf/1611.01224.pdf ​ submitted by /u/beepingwater_neko [link] [comments]  ( 1 min )
    What’s the best way to implement tree bases function approximators for RL/Control?
    Sorry if this post is not appropriate here, but I have been wondering how can I implement and learn a decision tree or any other non differentiable function approximators for the Value Function. It’s relatively easy to formulate and use DQN type algorithms by using neural network and say pytorch + stochastic optimization but I want to try out some tree based methods. (at least to reproduce papers which claim to use them) But I don’t know 1) If we have to design the structures and learning algorithms by hand or is there any package I can use? 2) How should the learning be done? We obviously can’t go regression type learning because of the bootstrapping nature of the Bellman equation? Thanks submitted by /u/Htaseht [link] [comments]  ( 1 min )
    I'm working on a DQN agent using the Keras RL library to play Atari games, however a weird thing keeps happening where every episode is the same length but it's a random number each time.
    The episode step count is the same for training and testing. submitted by /u/Gleann_na_nGealt [link] [comments]  ( 1 min )
    Are there any real life projects I could do with this? How do I get ideas to use this?
    I tried using RL for some work at my university, it did not really work all that well. I'm wondering if there are some real life scenarios that I could use to create my own personal projects. Otherwise, I'd be fine with games. I want to try all of it out from Dynamic Programming to crazy ass stuff like A3C, PPO and so on. I like RL, more so than any other form of ML, and I want to play around with it as a hobby. This is really the area of ML for me. For starters, there are fundamentals behind it, so you can mostly explain why agents do one thing or another. Additionally, there isn't a need to have massive amounts of data. It's also the only type of ML that I've been able to successfully use with software engineering. Designing the agent and the API it uses to take actions in an environment is as much a software engineering project as is creating a REST API. I feel there is a lot of potential for me to go crazy with this, and I was wondering if people have any cool suggestions. Anything that is real time is anything that I want to do. Real time systems and RL are exactly the sort of thing I love to do. submitted by /u/HesperusIII [link] [comments]  ( 2 min )
  • Open

    AI News | ALS Brain Computer Interface 1 Year Human Trial Results | Skin Cancer Detection | New IBM AI Hardware
    submitted by /u/getrich_or_diemining [link] [comments]
    Your Next Teacher Will be a Machine: Why the Future of Education is Automation
    submitted by /u/itsallshit-eatup [link] [comments]
    Hi!, Im wondering if anyone could help me🇦🇷🇦🇷
    Im a 19yo guy from Argentina that studies system ingeneer, I like my career, beeing an ingeneer is great, but coding and AI is greater, Im tired of courses like Free code academy, or basics things, im looking for a more professional, useful and deeper courses, that will really teach me, im currently with python(pandas,numpy,matplotlib,tensorflow) basics, and wanna to be better in that field that i love❤ submitted by /u/Sasulanda [link] [comments]  ( 1 min )
    Active non-ML research areas?
    What are the most active non-ML/statistical research areas in AI? Are there any recent books published that give an overview of such areas? Seems like AI is now either ML or people saying that ML won’t work, but vague on alternatives. submitted by /u/spookyplatypus [link] [comments]  ( 1 min )
    Heard about Github Copilot? Now Meet Salesforce's 'CodeGen’ : An AI Model That Turns Simple Natural Language Requests Into Executable Code
    Imagine being able to tell a machine to write an app simply by telling it what the app does. As far-fetched as it may appear, this scenario is already a reality. According to Salesforce AI Research, conversational AI programming is a new paradigm that brings this vision to life, thanks to an AI system that builds software. Introducing CodeGen: Creating Programs from Prompts The large-scale language model, CodeGen, which converts simple English prompts into executable code, is the first step toward this objective. The person doesn’t write any code; instead, (s)he describes what (s)he wants the code to perform in normal language, and the computer does the rest. Conversational AI refers to technologies that allow a human and a computer to engage naturally through a conversation. Chatbots, voice assistants, and virtual agents are examples of conversational AI. Continue Reading Paper: https://arxiv.org/pdf/2203.13474.pdf Github: https://github.com/salesforce/CodeGen https://i.redd.it/dbyba3dct8r81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 2 min )
  • Open

    Dan Huttenlocher ponders our human future in an age of artificial intelligence
    For the MIT Schwarzman College of Computing dean, bringing disciplines together is the best way to address challenges and opportunities posed by rapid advancements in computing.  ( 8 min )
  • Open

    Enhancing Satellite Imagery using Deep Learning for the Sensor To Shooter Timeline. (arXiv:2203.00116v3 [cs.CV] UPDATED)
    The sensor to shooter timeline is affected by two main variables: satellite positioning and asset positioning. Speeding up satellite positioning by adding more sensors or by decreasing processing time is important only if there is a prepared shooter, otherwise the main source of time is getting the shooter into position. However, the intelligence community should work towards the exploitation of sensors to the highest speed and effectiveness possible. Achieving a high effectiveness while keeping speed high is a tradeoff that must be considered in the sensor to shooter timeline. In this paper we investigate two main ideas, increasing the effectiveness of satellite imagery through image manipulation and how on-board image manipulation would affect the sensor to shooter timeline. We cover these ideas in four scenarios: Discrete Event Simulation of onboard processing versus ground station processing, quality of information with cloud cover removal, information improvement with super resolution, and data reduction with image to caption. This paper will show how image manipulation techniques such as Super Resolution, Cloud Removal, and Image to Caption will improve the quality of delivered information in addition to showing how those processes effect the sensor to shooter timeline.  ( 2 min )

  • Open

    AgentZero: Ray & PyTorch based light-weight Distributed Fast Reinforcement Learning Framework
    AgentZero https://github.com/zhoubin-me/agent0 This is my personal project developed two years ago. It covers major DRL algorithms like: - [DQN](https://arxiv.org/abs/1312.5602)- [Double DQN](https://arxiv.org/abs/1509.06461)- [Dueling DQN](https://arxiv.org/abs/1511.06581)- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)- [Noisy Network](https://arxiv.org/abs/1706.10295)- [C51](https://arxiv.org/abs/1707.06887)- [Rainbow](https://arxiv.org/abs/1710.02298)- [QR-DQN](https://arxiv.org/abs/1710.10044)- [IQR](https://arxiv.org/abs/1806.06923)- [FQF](https://arxiv.org/abs/1911.02140)- [DDPG](https://arxiv.org/abs/1509.02971)- [SAC](https://arxiv.org/abs/1801.01290)- [TD3](https://arxiv.org/abs/1802.09477)- [MDQN](https://arxiv.org/abs/2007.14430) What is amazing is its speed and memory efficiency after some optimization: With a single 2080Ti GPU and a 8 core AMD CPU, the training speed of rainbow for Atari could achieve 3000FPS, which means it can finish training of 10M frames within 1 hour. With compression of image frames, replay memory's RAM usage is down by 20%. I have tested several algorithms and games on Atari and get some initial result. Welcome to use and contribute! submitted by /u/zhoubin-me [link] [comments]  ( 1 min )
    Regularization for DRL: reward or objective function?
    I am searching for regularization methods applied to DRL algorithms (either value or policy-based) to understand what has been done so far in the field. I cannot find any valid reference that studies the effect of applying a soft constraint to the reward function instead of to the policy objective. This may seem useless for some applications, but it is highly relevant for finance, which is my domain of application, The idea I have so far is that if you constrain the reward, it is like you are imposing limits on the agent's behavior. On the other hand, if you constrain the objective, you are not limiting the behavior, but you are correcting the ex-post the undesired behavior. The latter way does not allow the agent to learn not to behave in a certain way. ​ Did anyone ever think about it? Are they good references that analyze the different effects of a constraint to whatever DRL algorithm? submitted by /u/alebrini [link] [comments]  ( 1 min )
    What does it mean to feed the "network state" in an LSTM in the actor network?
    I am looking at this code from Google (https://github.com/google-research/google-research/blob/master/social_rl/multiagent_tfagents/joint_attention/attention_networks.py). At line 639, the LSTM is called. The first two inputs are the state and the network state, but I don't understand what the latter is. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    DeepRL and Rubik’s Cube
    I'm part of a group of researchers from top ML institutions and industry, our goal is to figure out how improve efficiency in DeepRL. We are looking at Rubik’s Cube as target problem, and kicking off a project which will start from https://github.com/forestagostinelli/DeepCubeA and go from there. Prior works require hand crafted curriculum and billion of interactions to solve a cube, we believe that order of magnitude more compute that it should take. Is anyone interested to collaborate? I'm happy to dedicate a few hours a week to help a newcomer like I was a few years ago with the RL stuff given some basics of machine learning and programming skills, and this could be the golden opportunity for someone to see RL at scale. submitted by /u/mind_library [link] [comments]  ( 2 min )
    Multi agent reinforcement learning
    I'm absolutely new to machine learning, let alone reinforcement learning. I've been tasked to replicate and if possible improve upon the paper linked in the post. I don't know what platform to use and how to create the custom environment. if anybody could share any resources it would be tremendously helpful. https://drive.google.com/file/d/1fIT43hKi61WUIvoTh2a3AWlRsphi-L98/view?usp=sharing submitted by /u/Lazarus_07 [link] [comments]  ( 1 min )
    q-learning vs. policy gradient
    Trying to wrap my head around the RL essentials. Would it be correct to say that Q-learning attempts to select the best available policy by optimizing the Q-function, while policy gradient methods work directly to optimize a pre-determined policy's parameters? submitted by /u/JimBeanery [link] [comments]  ( 1 min )
    How to use a deep model for DRL?
    I noticed most DRL papers use very shallow models like three or four layers. However, when I try to do DRL tasks that have relatively complicated scenes (for example, some modern video game), shallow models become way too weak. Are there papers, blogs, articles etc. that use more complex/deep models? Or maybe some methods that can deal with complicated scenes without deep models? Thanks submitted by /u/seermer [link] [comments]  ( 1 min )
    Any thought on: A universal parameter optimizer
    Hi all, I have an thought on A universal parameter optimizer I wanted to share with you and to see if you know some related work. Assume you have a simulation or access to an environment. There are certain parameters you can set to control the performance of a system which lives in this simulation/environment. Naturally, one wants to find the optimal parameters or optimal policy to set the parameter that can result the most reward, however that is defined. For example, in the stock market, I may want to find the optimal market price to buy and sell, or the optimal policy. In a car driving game, I may want to determine the optimal policy to set speed and direction. Do we know if there is formal way to study this type of problem? Thank you! submitted by /u/DB8868 [link] [comments]  ( 1 min )
  • Open

    Building Trust with Responsible AI
    Artificial Intelligence is being used in almost every aspect of life. AI symbolizes growth and productivity in the minds of some, but it is raising questions as well on the fairness, privacy, and security of these systems. Many legitimate issues exist, including biased choices, labor replacement, and a lack of security. When it comes to robots, this is very frightening. Self-driving automobiles, for example, can cause injury or death if they make mistakes. Responsible AI addresses these difficulties and makes AI systems more accountable. Responsible AI should fulfill the following aims: Interpretability: We obtain an explanation for how a model makes predictions when we interpret it. An AI system makes predictions for a user. Even if these selections are correct, a user is likely to seek an explanation. Responsible AI can describe how we create interpretable models. Fairness: AI systems have the potential to make judgments that are biased towards particular groups of people. Bias in the training data is the source of this bias. The easier it is to assure fairness and rectify any bias in a model, the more interpretable it is. As a result, we need a Responsible AI framework to explain how we evaluate fairness and what to do if a model makes unjust predictions. Safety and Security: AI systems aren’t deterministic. When confronted with new situations, they are prone to making poor choices. The systems can even be tampered with to make unwise decisions. Therefore, we need to ensure safety and security in these systems. Data Governance: The data used must be of high quality. If the data used by AI has errors, the system may make wrong decisions. Continue Reading The Article Here ​ https://preview.redd.it/9iivp31ir6r81.png?width=1024&format=png&auto=webp&s=207409694b68a1e985ad1dfcf3b466ac25916da2 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    It's unbelievable what ML can do! Disco+RIFE= 7hr Colab Run...
    submitted by /u/JoshGrambo [link] [comments]
    Creating A Chatbot with transformers and Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Researchers Develop Parking Analytics Framework Using Deep Learning
    Artificial Intelligence and deep learning in video analytics are gaining popularity. It has enabled a wide range of industrial applications, including surveillance and public safety, robotics perception, medical intervention, and facial recognition. According to Markets & Markets, the global market for video analytics was valued at USD 5.9 billion in 2021 and is predicted to reach USD 14.9 billion by 2026. Unmanned aerial vehicles (UAVs) have also enabled a wide range of video analytics applications (e.g., aerial surveys) since they provide aerial views of the environment, allowing for collecting aerial photos and processing with deep learning algorithms. Parking analytics is one of these critical smart city applications that uses deep learning and UAVs to collect real-time data and analyze it in order to maximize parking revenue, enhance parking resource allocations, and better manage public space. Continue Reading Paper: https://arxiv.org/pdf/2203.07792.pdf ​ https://i.redd.it/u5th7z0ja5r81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    How will AI impact games
    submitted by /u/GravermanYT [link] [comments]
  • Open

    Oscillations in RLC circuits
    Electrical and mechanical oscillations satisfy analogous equations. This is the basis of using the word “analog” in electronics. You could study a mechanical system by building an analogous circuit and measuring that circuit in a lab. Mass, dashpot, spring Years ago I wrote a series of four posts about mechanical vibrations: Free, undamped vibrations Free, […] Oscillations in RLC circuits first appeared on John D. Cook.  ( 2 min )
  • Open

    [D] How Logistic Regression nomogram is constructed from binary classifier?
    Well I' ve been reading some scientific works and I don't understand how nomograms are constructed from logistic regression models. In this example I have: https://ieeexplore-1ieee-1org-1000007l206e9.han.bg.pg.edu.pl/document/9514609 And they train LR model on Covid19 dataset [death/ didn't die] so it's binary classification problem However later on, they construct nomogram, which determines whether there is low/moderate/high risk of covid19 mortality. What I don't undestand is how they calculate the score the establish chances of death. E.G. If the score is <0.05 there is low possibility that patient will die. My general question is, how they constructed this nomogram from the binary classifier they had? submitted by /u/s168501 [link] [comments]  ( 1 min )
    [R] Neural Head Avatars from Monocular RGB Videos (CVPR 2022)
    submitted by /u/Mandelmus100 [link] [comments]
    [D] Predicting hard properties of graphs using Machine Learning
    Hello everyone, there is a lot of work in the field of geometric deep learning for combinatorial optimization that yields good approximation algorithms for "hard" problems on graphs (see here), with the most prominent example being the TSP problem. However, as far as I can see, all these considered problems share the fact that the computed solution is a subset of the vertices/edges of the original graph. In my field (graph drawing), one of the most important considered properties is the crossing number). Hence, the solution would not consists of a labeling of the edges/vertices, but is rather a regression task on the whole graph. I have a dataset that consists of roughly 10000 graphs together with their crossing number. Treating the above problem as a supervised regression task and simply inserting the graph into a GNN does not work for me at all - is this a problem of my choice of architecture or is this sort of "function" that maps a graph to its crossing number something we can expect no current architecture to find. I appreciate any comment, even if it is just your intuition on the problem. Best regards, MrLemming submitted by /u/MrLemming2 [link] [comments]  ( 1 min )
    [N] Announcement: Call for Papers for our ICML ShiftHappens Workshop!
    Dear community, I hope I do not violate rules by advertizing our Call for Papers here. In a nutshell, submissions can be robustness or OOD datasets and new metrics which we will consolidate in one benchmark. More infos on our website. I am happy to answer any questions in regards to the call. submitted by /u/helavisa4 [link] [comments]  ( 1 min )
    [P] OpenAI Codex helping to write shell commands
    submitted by /u/tomd_96 [link] [comments]  ( 1 min )
    [R][P] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] PetaFLOPS as a Unit of Measure in Machine Learning Applications
    I was looking at this paper (https://arxiv.org/pdf/2005.14165.pdf) and came across this graph: ​ https://preview.redd.it/49ydy9bo61r81.png?width=665&format=png&auto=webp&s=729743449d6a99d2b84b81610e7e32d87ea4dfeb I am trying to understand the following two things about this graph: ​ What is PetaFLOP/s-days? I read that a PetaFLOP is 1,000,000,000,000,000 calculations (e.g. addition, subtraction). I am guessing that 10^2 would imply 100 * 1,000,000,000,000,000 calculations per day - is this correct? Is there any difference between PetaFLOP/days and PetaFLOP/s-days? (I also find it interesting they are probably referring to "computer resources" as simply "compute") What does "C" stand for in L = 2.57 * C^-0.048? I am guessing that the "dotted line" probably refers to the "average loss" for different Neural Networks with differing amounts of Parameters - but what exactly does "C" stand for? Finally, is there a reason that "Validation Loss" is not expressed as a percentage? For instance, what is a Validation Loss of 3? Is a Validation Loss of 3 the same as a Loss of 30%? Or does Validation Loss simply refer to the value of the Loss Function obtained during the Validation stage of Cross Validation? Thank you! submitted by /u/blueest [link] [comments]  ( 2 min )
  • Open

    What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02)
    In my previous post, I briefly described how leading companies use experimentation to optimize their products and services and evolve them to the point of feeling elegant, efficient, and magical. These companies have developed mature experimentation programs (ExPrs), including the… Read More The post What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02) appeared first on ML in Production.  ( 6 min )
  • Open

    What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02)
    In my previous post, I briefly described how leading companies use experimentation to optimize their products and services and evolve them to the point of feeling elegant, efficient, and magical. These companies have developed mature experimentation programs (ExPrs), including the… Read More The post What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02) appeared first on ML in Production.  ( 6 min )
  • Open

    NN from Scratch: #1 Data Preprocessing | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    C++ Machine Learning Book
    Hey, guys. Just want to ask if anybody's interested with a C++ machine learning book, "Hands-on Machine Learning with C++" by Kirill Kolodiazhnyi. If you are, send me a DM. submitted by /u/edmondgrasa [link] [comments]
    Efficient net vs resnet
    As the title says, I would like to know/get some direction to the question when in general does and effnet is preferred to a resnet? I understand that the paper compares performances and it shows a higher performance wrt every network. So my question would be is that always the case or is there a specific situation where it would be better? Sorry for the typos (on my mobile) submitted by /u/johnyj01 [link] [comments]  ( 1 min )
  • Open

    MyStyle: A Personalized Generative Prior. (arXiv:2203.17272v1 [cs.CV])
    We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial characteristics. Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space. We show that this manifold constitutes a personalized region that spans latent codes associated with diverse portrait images of the individual. Moreover, we demonstrate that we obtain a personalized generative prior, and propose a unified approach to apply it to various ill-posed image enhancement problems, such as inpainting and super-resolution, as well as semantic editing. Using the personalized generative prior we obtain outputs that exhibit high-fidelity to the input images and are also faithful to the key facial characteristics of the individual in the reference set. We demonstrate our method with fair-use images of numerous widely recognizable individuals for whom we have the prior knowledge for a qualitative evaluation of the expected outcome. We evaluate our approach against few-shots baselines and show that our personalized prior, quantitatively and qualitatively, outperforms state-of-the-art alternatives.  ( 2 min )
    Muesli: Combining Improvements in Policy Optimization. (arXiv:2104.06159v2 [cs.LG] UPDATED)
    We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.  ( 2 min )
    Improved Relation Networks for End-to-End Speaker Verification and Identification. (arXiv:2203.17218v1 [eess.AS])
    Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and relation networks for this use case. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. The use of relation networks facilitates joint training of the frontend speaker encoder and the backend model. Inspired by the use of prototypical networks in speaker verification and to increase the discriminability of the speaker embeddings, we train the model to classify samples in the current episode amongst all speakers present in the training set. Furthermore, we propose a new training regime for faster model convergence by extracting more information from a given meta-learning episode with negligible extra computation. We evaluate the proposed techniques on VoxCeleb, SITW and VCTK datasets on the tasks of speaker verification and unseen speaker identification. The proposed approach outperforms the existing approaches consistently on both tasks.  ( 2 min )
    Demystifying the Transferability of Adversarial Attacks in Computer Networks. (arXiv:2110.04488v3 [cs.CR] UPDATED)
    Convolutional Neural Networks (CNNs) models are one of the most frequently used deep learning networks, and extensively used in both academia and industry. Recent studies demonstrated that adversarial attacks against such models can maintain their effectiveness even when used on models other than the one targeted by the attacker. This major property is known as transferability, and makes CNNs ill-suited for security applications. In this paper, we provide the first comprehensive study which assesses the robustness of CNN-based models for computer networks against adversarial transferability. Furthermore, we investigate whether the transferability property issue holds in computer networks applications. In our experiments, we first consider five different attacks: the Iterative Fast Gradient Method (I-FGSM), the Jacobian-based Saliency Map (JSMA), the Limited-memory Broyden Fletcher Goldfarb Shanno BFGS (L- BFGS), the Projected Gradient Descent (PGD), and the DeepFool attack. Then, we perform these attacks against three well- known datasets: the Network-based Detection of IoT (N-BaIoT) dataset, the Domain Generating Algorithms (DGA) dataset, and the RIPE Atlas dataset. Our experimental results show clearly that the transferability happens in specific use cases for the I- FGSM, the JSMA, and the LBFGS attack. In such scenarios, the attack success rate on the target network range from 63.00% to 100%. Finally, we suggest two shielding strategies to hinder the attack transferability, by considering the Most Powerful Attacks (MPAs), and the mismatch LSTM architecture.  ( 2 min )
    DeepEdge: A Deep Reinforcement Learning based Task Orchestrator for Edge Computing. (arXiv:2110.01863v2 [cs.NI] UPDATED)
    The improvements in the edge computing technology pave the road for diversified applications that demand real-time interaction. However, due to the mobility of the end-users and the dynamic edge environment, it becomes challenging to handle the task offloading with high performance. Moreover, since each application in mobile devices has different characteristics, a task orchestrator must be adaptive and have the ability to learn the dynamics of the environment. For this purpose, we develop a deep reinforcement learning based task orchestrator, DeepEdge, which learns to meet different task requirements without needing human interaction even under the heavily-loaded stochastic network conditions in terms of mobile users and applications. Given the dynamic offloading requests and time-varying communication conditions, we successfully model the problem as a Markov process and then apply the Double Deep Q-Network (DDQN) algorithm to implement DeepEdge. To evaluate the robustness of DeepEdge, we experiment with four different applications including image rendering, infotainment, pervasive health, and augmented reality in the network under various loads. Furthermore, we compare the performance of our agent with the four different task offloading approaches in the literature. Our results show that DeepEdge outperforms its competitors in terms of the percentage of satisfactorily completed tasks.  ( 2 min )
    Model-based Reinforcement Learning: A Survey. (arXiv:2006.16712v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.  ( 2 min )
    TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing. (arXiv:2203.17266v1 [cs.CV])
    Recent advances like StyleGAN have promoted the growth of controllable facial editing. To address its core challenge of attribute decoupling in a single latent space, attempts have been made to adopt dual-space GAN for better disentanglement of style and content representations. Nonetheless, these methods are still incompetent to obtain plausible editing results with high controllability, especially for complicated attributes. In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. We propose TransEditor, a novel Transformer-based framework to enhance such interaction. Besides, we develop a new dual-space editing and inversion strategy to provide additional editing flexibility. Extensive experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.  ( 2 min )
    A Unifying Framework for Reinforcement Learning and Planning. (arXiv:2006.15009v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.  ( 2 min )
    Data Augmentation for Opcode Sequence Based Malware Detection. (arXiv:2106.11821v2 [cs.CR] UPDATED)
    In this paper we study data augmentation for opcode sequence based Android malware detection. Data augmentation has been successfully used in many areas of deep-learning to significantly improve model performance. Typically, data augmentation simulates realistic variations in data to increase the apparent diversity of the training-set. However, for opcode-based malware analysis it is not immediately clear how to apply data augmentation. Hence we first study the use of fixed transformations, then progress to adaptive methods. We propose a novel data augmentation method -- Self-Embedding Language Model Augmentation -- that uses a malware detection network's own opcode embedding layer to measure opcode similarity for adaptive augmentation. To the best of our knowledge this is the first paper to carry out a systematic study of different augmentation methods for opcode sequence based Android malware classification.  ( 2 min )
    Recommender Systems meet Mechanism Design. (arXiv:2110.12558v2 [cs.GT] UPDATED)
    Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.  ( 2 min )
    Preliminary Steps Towards Federated Sentiment Classification. (arXiv:2107.11956v2 [cs.CL] UPDATED)
    Automatically mining sentiment tendency contained in natural language is a fundamental research to some artificial intelligent applications, where solutions alternate with challenges. Transfer learning and multi-task learning techniques have been leveraged to mitigate the supervision sparsity and collaborate multiple heterogeneous domains correspondingly. Recent years, the sensitive nature of users' private data raises another challenge for sentiment classification, i.e., data privacy protection. In this paper, we resort to federated learning for multiple domain sentiment classification under the constraint that the corpora must be stored on decentralized devices. In view of the heterogeneous semantics across multiple parties and the peculiarities of word embedding, we pertinently provide corresponding solutions. First, we propose a Knowledge Transfer Enhanced Private-Shared (KTEPS) framework for better model aggregation and personalization in federated sentiment classification. Second, we propose KTEPS$^\star$ with the consideration of the rich semantic and huge embedding size properties of word vectors, utilizing Projection-based Dimension Reduction (PDR) methods for privacy protection and efficient transmission simultaneously. We propose two federated sentiment classification scenes based on public benchmarks, and verify the superiorities of our proposed methods with abundant experimental investigations.  ( 2 min )
    Causal Feature Selection for Algorithmic Fairness. (arXiv:2006.06053v2 [cs.LG] UPDATED)
    The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.  ( 2 min )
    When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?. (arXiv:2110.04184v2 [cs.LG] UPDATED)
    Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games with $H$ steps, $S$ states, and $A_i$ actions per player. First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in $\max_{i\le m} A_i$. Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $\epsilon$-approximate Nash equilibrium within $\widetilde{\mathcal{O}}(S\sum_{i\le m} A_i / \epsilon^3)$ episodes (when only highlighting the dependence on $S$, $A_i$, and $\epsilon$), which only depends linearly in $\sum_{i\le m} A_i$ and significantly improves over existing efficient algorithm in the $\epsilon$ dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.  ( 2 min )
    TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining. (arXiv:2110.13492v4 [cs.LG] UPDATED)
    We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.  ( 2 min )
    Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments. (arXiv:2010.04605v2 [cs.LG] UPDATED)
    Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.  ( 2 min )
    Rerunning OCR: A Machine Learning Approach to Quality Assessment and Enhancement Prediction. (arXiv:2110.01661v4 [cs.CL] UPDATED)
    Iterating with new and improved OCR solutions enforces decision making when it comes to targeting the right candidates for reprocessing. This especially applies when the underlying data collection is of considerable size and rather diverse in terms of fonts, languages, periods of publication and consequently OCR quality. This article captures the efforts of the National Library of Luxembourg to support those targeting decisions. They are crucial in order to guarantee low computational overhead and reduced quality degradation risks, combined with a more quantifiable OCR improvement. In particular, this work explains the methodology of the library with respect to text block level quality assessment. Through extension of this technique, a regression model, that is able to take into account the enhancement potential of a new OCR engine, is also presented. They both mark promising approaches, especially for cultural institutions dealing with historical data of lower quality.  ( 2 min )
    Continual Speaker Adaptation for Text-to-Speech Synthesis. (arXiv:2103.14512v2 [cs.CL] UPDATED)
    Training a multi-speaker Text-to-Speech (TTS) model from scratch is computationally expensive and adding new speakers to the dataset requires the model to be re-trained. The naive solution of sequential fine-tuning of a model for new speakers can lead to poor performance of older speakers. This phenomenon is known as catastrophic forgetting. In this paper, we look at TTS modeling from a continual learning perspective, where the goal is to add new speakers without forgetting previous speakers. Therefore, we first propose an experimental setup and show that serial fine-tuning for new speakers can cause the forgetting of the earlier speakers. Then we exploit two well-known techniques for continual learning, namely experience replay and weight regularization. We reveal how one can mitigate the effect of degradation in speech synthesis diversity in sequential training of new speakers using these methods. Finally, we present a simple extension to experience replay to improve the results in extreme setups where we have access to very small buffers.  ( 2 min )
    Adversarial Examples in Random Neural Networks with General Activations. (arXiv:2203.17209v1 [cs.LG])
    A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations. More precisely, given a neural network $f(\,\cdot\,;{\boldsymbol \theta})$ with random weights ${\boldsymbol \theta}$, and feature vector ${\boldsymbol x}$, we show that an adversarial example ${\boldsymbol x}'$ can be found with high probability along the direction of the gradient $\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$. Our proof is based on a Gaussian conditioning technique. Instead of proving that $f$ is approximately linear in a neighborhood of ${\boldsymbol x}$, we characterize the joint distribution of $f({\boldsymbol x};{\boldsymbol \theta})$ and $f({\boldsymbol x}';{\boldsymbol \theta})$ for ${\boldsymbol x}' = {\boldsymbol x}-s({\boldsymbol x})\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$.  ( 2 min )
    A 23 MW data centre is all you need. (arXiv:2203.17265v1 [cs.LG])
    The field of machine learning has achieved striking progress in recent years, witnessing breakthrough results on language modelling, protein folding and nitpickingly fine-grained dog breed classification. Some even succeeded at playing computer games and board games, a feat both of engineering and of setting their employers' expectations. The central contribution of this work is to carefully examine whether this progress, and technology more broadly, can be expected to continue indefinitely. Through a rigorous application of statistical theory and failure to extrapolate beyond the training data, we answer firmly in the negative and provide details: technology will peak at 3:07 am (BST) on 20th July, 2032. We then explore the implications of this finding, discovering that individuals awake at this ungodly hour with access to a sufficiently powerful computer possess an opportunity for myriad forms of long-term linguistic 'lock in'. All we need is a large (>> 1W) data centre to seize this pivotal moment. By setting our analogue alarm clocks, we propose a tractable algorithm to ensure that, for the future of humanity, the British spelling of colour becomes the default spelling across more than 80% of the global word processing software market.  ( 2 min )
    Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities. (arXiv:2203.17242v1 [cs.SD])
    We present a novel feasibility study on the automatic recognition of Expressed Emotion (EE), a family environment concept based on caregivers speaking freely about their relative/family member. We describe an automated approach for determining the \textit{degree of warmth}, a key component of EE, from acoustic and text features acquired from a sample of 37 recorded interviews. These recordings, collected over 20 years ago, are derived from a nationally representative birth cohort of 2,232 British twin children and were manually coded for EE. We outline the core steps of extracting usable information from recordings with highly variable audio quality and assess the efficacy of four machine learning approaches trained with different combinations of acoustic and text features. Despite the challenges of working with this legacy data, we demonstrated that the degree of warmth can be predicted with an $F_{1}$-score of \textbf{61.5\%}. In this paper, we summarise our learning and provide recommendations for future work using real-world speech samples.
    VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers. (arXiv:2203.17247v1 [cs.CV])
    Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
    Wind Farm Layout Optimisation using Set Based Multi-objective Bayesian Optimisation. (arXiv:2203.17065v1 [stat.ML])
    Wind energy is one of the cleanest renewable electricity sources and can help in addressing the challenge of climate change. One of the drawbacks of wind-generated energy is the large space necessary to install a wind farm; this arises from the fact that placing wind turbines in a limited area would hinder their productivity and therefore not be economically convenient. This naturally leads to an optimisation problem, which has three specific challenges: (1) multiple conflicting objectives (2) computationally expensive simulation models and (3) optimisation over design sets instead of design vectors. The first and second challenges can be addressed by using surrogate-assisted e.g.\ Bayesian multi-objective optimisation. However, the traditional Bayesian optimisation cannot be applied as the optimisation function in the problem relies on design sets instead of design vectors. This paper extends the applicability of Bayesian multi-objective optimisation to set based optimisation for solving the wind farm layout problem. We use a set-based kernel in Gaussian process to quantify the correlation between wind farms (with a different number of turbines). The results on the given data set of wind energy and direction clearly show the potential of using set-based Bayesian multi-objective optimisation.
    OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization. (arXiv:2106.03721v3 [cs.LG] UPDATED)
    Deep learning has achieved tremendous success with independent and identically distributed (i.i.d.) data. However, the performance of neural networks often degenerates drastically when encountering out-of-distribution (OoD) data, i.e., when training and test data are sampled from different distributions. While a plethora of algorithms have been proposed for OoD generalization, our understanding of the data used to train and evaluate these algorithms remains stagnant. In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift. Overall, we position existing datasets and algorithms from different research areas seemingly unconnected into the same coherent picture. It may serve as a foothold that can be resorted to by future OoD generalization research. Our code is available at https://github.com/ynysjtu/ood_bench.
    Does Audio Deepfake Detection Generalize?. (arXiv:2203.16263v2 [cs.SD] UPDATED)
    Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Which factors contribute to success, and which are accidental? In this work, we address this problem: We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work. We identify overarching features for successful audio deepfake detection, such as using cqtspec or logspec features instead of melspec features, which improves performance by 37% EER on average, all other factors constant. Additionally, we evaluate generalization capabilities: We collect and publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes. We find that related work performs poorly on such real-world data (performance degradation of up to one thousand percent). This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought.
    Optimisation-free Classification and Density Estimation with Quantum Circuits. (arXiv:2203.14452v2 [quant-ph] UPDATED)
    We demonstrate the implementation of a novel machine learning framework for probability density estimation and classification using quantum circuits. The framework maps a training data set or a single data sample to the quantum state of a physical system through quantum feature maps. The quantum state of the arbitrarily large training data set summarises its probability distribution in a finite-dimensional quantum wave function. By projecting the quantum state of a new data sample onto the quantum state of the training data set, one can derive statistics to classify or estimate the density of the new data sample. Remarkably, the implementation of our framework on a real quantum device does not require any optimisation of quantum circuit parameters. Nonetheless, we discuss a variational quantum circuit approach that could leverage quantum advantage for our framework.
    Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review. (arXiv:2203.15588v2 [cs.LG] UPDATED)
    The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
    TraHGR: Transformer for Hand Gesture Recognition via ElectroMyography. (arXiv:2203.16336v2 [eess.SP] UPDATED)
    Deep learning-based Hand Gesture Recognition (HGR) via surface Electromyogram (sEMG) signals has recently shown significant potential for development of advanced myoelectric-controlled prosthesis. Existing deep learning approaches, typically, include only one model as such can hardly maintain acceptable generalization performance in changing scenarios. In this paper, we aim to address this challenge by capitalizing on the recent advances of hybrid models and transformers. In other words, we propose a hybrid framework based on the transformer architecture, which is a relatively new and revolutionizing deep learning model. The proposed hybrid architecture, referred to as the Transformer for Hand Gesture Recognition (TraHGR), consists of two parallel paths followed by a linear layer that acts as a fusion center to integrate the advantage of each module and provide robustness over different scenarios. We evaluated the proposed architecture TraHGR based on the commonly used second Ninapro dataset, referred to as the DB2. The sEMG signals in the DB2 dataset are measured in the real-life conditions from 40 healthy users, each performing 49 gestures. We have conducted extensive set of experiments to test and validate the proposed TraHGR architecture, and have compared its achievable accuracy with more than five recently proposed HGR classification algorithms over the same dataset. We have also compared the results of the proposed TraHGR architecture with each individual path and demonstrated the distinguishing power of the proposed hybrid architecture. The recognition accuracies of the proposed TraHGR architecture are 86.18%, 88.91%, 81.44%, and 93.84%, which are 2.48%, 5.12%, 8.82%, and 4.30% higher than the state-ofthe-art performance for DB2 (49 gestures), DB2-B (17 gestures), DB2-C (23 gestures), and DB2-D (9 gestures), respectively.
    D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions. (arXiv:2112.03028v2 [cs.RO] CROSS LISTED)
    We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. This is challenging, because it requires reasoning about the complex articulation of the human hand and the intricate physical interaction with the object. We propose a novel method that frames this problem in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions. A hierarchical approach decomposes the task into low-level grasping and high-level motion synthesis. It can be used to generate novel hand sequences that approach, grasp, and move an object to a desired location, while retaining human-likeness. We show that our approach leads to stable grasps and generates a wide range of motions. Furthermore, even imperfect labels can be corrected by our method to generate dynamic interaction sequences.
    Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. (arXiv:2203.15937v1 [eess.AS] CROSS LISTED)
    Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels gains a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution achieves a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility.
    UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning. (arXiv:2203.14542v2 [cs.CV] UPDATED)
    Supervised deep learning methods require a large repository of annotated data; hence, label noise is inevitable. Training with such noisy data negatively impacts the generalization performance of deep neural networks. To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data. Next, an off-the-shelf semi-supervised learning method is used for training where rejected samples are treated as unlabeled data. Our comprehensive analysis shows that current selection methods disproportionately select samples from easy (fast learnable) classes while rejecting those from relatively harder ones. This creates class imbalance in the selected clean set and in turn, deteriorates performance under high label noise. In this work, we propose UNICON, a simple yet effective sample selection method which is robust to high label noise. To address the disproportionate selection of easy and hard samples, we introduce a Jensen-Shannon divergence based uniform selection mechanism which does not require any probabilistic modeling and hyperparameter tuning. We complement our selection method with contrastive learning to further combat the memorization of noisy labels. Extensive experimentation on multiple benchmark datasets demonstrates the effectiveness of UNICON; we obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate. Our code is publicly available
    Well-classified Examples are Underestimated in Classification with Deep Neural Networks. (arXiv:2110.06537v5 [cs.LG] UPDATED)
    The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i.e., well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and margin growth. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to the learning process. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks because our idea can solve these three issues. Code is available at: https://github.com/lancopku/well-classified-examples-are-underestimated.
    Omnivore: A Single Model for Many Visual Modalities. (arXiv:2201.08377v2 [cs.CV] UPDATED)
    Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is trained jointly on classification tasks from different modalities. Omnivore is simple to train, uses off-the-shelf standard datasets, and performs at-par or better than modality-specific models of the same size. A single Omnivore model obtains 86.0% on ImageNet, 84.1% on Kinetics, and 67.1% on SUN RGB-D. After finetuning, our models outperform prior work on a variety of vision tasks and generalize across modalities. Omnivore's shared visual representation naturally enables cross-modal recognition without access to correspondences between modalities. We hope our results motivate researchers to model visual modalities together.
    Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios. (arXiv:2203.09767v2 [cs.SD] UPDATED)
    Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with the power set, which represents the possible combinations of target speakers. This formulation has two benefits. First, the overlaps of target speakers are explicitly modeled. Second, threshold selection is no longer needed. Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings. Experimental results show that SEND has a stable learning process and can be trained on highly overlapped data without extra initialization. More importantly, our method achieves the state-of-the-art performance in real meeting scenarios with fewer model parameters and lower computational complexity.
    Forecasting from LiDAR via Future Object Detection. (arXiv:2203.16297v2 [cs.CV] UPDATED)
    Object detection and forecasting are fundamental components of embodied perception. These two problems, however, are largely studied in isolation by the community. In this paper, we propose an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Instead of predicting the current frame locations and forecasting forward in time, we directly predict future object locations and backcast to determine where each trajectory began. Our approach not only improves overall accuracy compared to other modular or end-to-end baselines, it also prompts us to rethink the role of explicit tracking for embodied perception. Additionally, by linking future and current locations in a many-to-one manner, our approach is able to reason about multiple futures, a capability that was previously considered difficult for end-to-end approaches. We conduct extensive experiments on the popular nuScenes dataset and demonstrate the empirical effectiveness of our approach. In addition, we investigate the appropriateness of reusing standard forecasting metrics for an end-to-end setup, and find a number of limitations which allow us to build simple baselines to game these metrics. We address this issue with a novel set of joint forecasting and detection metrics that extend the commonly used AP metrics from the detection community to measuring forecasting accuracy. Our code is available at https://github.com/neeharperi/FutureDet
    Graph Neural Networks in IoT: A Survey. (arXiv:2203.15935v2 [cs.LG] UPDATED)
    The Internet of Things (IoT) boom has revolutionized almost every corner of people's daily lives: healthcare, home, transportation, manufacturing, supply chain, and so on. With the recent development of sensor and communication technologies, IoT devices including smart wearables, cameras, smartwatches, and autonomous vehicles can accurately measure and perceive their surrounding environment. Continuous sensing generates massive amounts of data and presents challenges for machine learning. Deep learning models (e.g., convolution neural networks and recurrent neural networks) have been extensively employed in solving IoT tasks by learning patterns from multi-modal sensory data. Graph Neural Networks (GNNs), an emerging and fast-growing family of neural network models, can capture complex interactions within sensor topology and have been demonstrated to achieve state-of-the-art results in numerous IoT learning tasks. In this survey, we present a comprehensive review of recent advances in the application of GNNs to the IoT field, including a deep dive analysis of GNN design in various IoT sensing environments, an overarching list of public data and source code from the collected publications, and future research directions. To keep track of newly published works, we collect representative papers and their open-source implementations and create a Github repository at https://github.com/GuiminDong/GNN4IoT.
    Longitudinal Fairness with Censorship. (arXiv:2203.16024v2 [cs.LG] UPDATED)
    Recent works in artificial intelligence fairness attempt to mitigate discrimination by proposing constrained optimization programs that achieve parity for some fairness statistic. Most assume availability of the class label, which is impractical in many real-world applications such as precision medicine, actuarial analysis and recidivism prediction. Here we consider fairness in longitudinal right-censored environments, where the time to event might be unknown, resulting in censorship of the class label and inapplicability of existing fairness studies. We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship for these important and socially-sensitive tasks. Our experiments on four censored datasets confirm the utility of our approach.
    An Evaluation Dataset for Legal Word Embedding: A Case Study On Chinese Codex. (arXiv:2203.15173v1 [cs.CL] CROSS LISTED)
    Word embedding is a modern distributed word representations approach widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-\`a-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v3 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.
    STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity. (arXiv:2203.09611v2 [cs.LG] UPDATED)
    Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.
    Image Compression and Actionable Intelligence With Deep Neural Networks. (arXiv:2203.13686v2 [cs.LG] UPDATED)
    If a unit cannot receive intelligence from a source due to external factors, we consider them disadvantaged users. We categorize this as a preoccupied unit working on a low connectivity device on the edge. This case requires that we use a different approach to deliver intelligence, particularly satellite imagery information, than normally employed. To address this, we propose a survey of information reduction techniques to deliver the information from a satellite image in a smaller package. We investigate four techniques to aid in the reduction of delivered information: traditional image compression, neural network image compression, object detection image cutout, and image to caption. Each of these mechanisms have their benefits and tradeoffs when considered for a disadvantaged user.
    Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data. (arXiv:2112.13215v2 [cs.LG] UPDATED)
    International audit standards require the direct assessment of a financial statement's underlying accounting journal entries. Driven by advances in artificial intelligence, deep-learning inspired audit techniques emerged to examine vast quantities of journal entry data. However, in regular audits, most of the proposed methods are applied to learn from a comparably stationary journal entry population, e.g., of a financial quarter or year. Ignoring situations where audit relevant distribution changes are not evident in the training data or become incrementally available over time. In contrast, in continuous auditing, deep-learning models are continually trained on a stream of recorded journal entries, e.g., of the last hour. Resulting in situations where previous knowledge interferes with new information and will be entirely overwritten. This work proposes a continual anomaly detection framework to overcome both challenges and designed to learn from a stream of journal entry data experiences. The framework is evaluated based on deliberately designed audit scenarios and two real-world datasets. Our experimental results provide initial evidence that such a learning scheme offers the ability to reduce false-positive alerts and false-negative decisions.
    Radial Autoencoders for Enhanced Anomaly Detection. (arXiv:2203.15884v2 [cs.LG] UPDATED)
    In classification problems, supervised machine-learning methods outperform traditional algorithms, thanks to the ability of neural networks to learn complex patterns. However, in two-class classification tasks like anomaly or fraud detection, unsupervised methods could do even better, because their prediction is not limited to previously learned types of anomalies. An intuitive approach of anomaly detection can be based on the distances from the centers of mass of the two respective classes. Autoencoders, although trained without supervision, can also detect anomalies: considering the center of mass of the normal points, reconstructions have now radii, with largest radii most likely indicating anomalous points. Of course, radii-based classification were already possible without interposing an autoencoder. In any space, radial classification can be operated, to some extent. In order to outperform it, we proceed to radial deformations of data (i.e. centric compression or expansions of axes) and autoencoder training. Any autoencoder that makes use of a data center is here baptized a centric autoencoder (cAE). A special type is the cAE trained with a uniformly compressed dataset, named the centripetal autoencoder (cpAE). The new concept is studied here in relation with a schematic artificial dataset, and the derived methods show consistent score improvements. But tested on real banking data, our radial deformation supervised algorithms alone still perform better that cAEs, as expected from most supervised methods; nonetheless, in hybrid approaches, cAEs can be combined with a radial deformation of space, improving its classification score. We expect that centric autoencoders will become irreplaceable objects in anomaly live detection based on geometry, thanks to their ability to stem naturally on geometrical algorithms and to their native capability of detecting unknown anomaly types.
    Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks. (arXiv:2011.13634v3 [cs.LG] UPDATED)
    The problem of resource constrained scheduling in a dynamic and heterogeneous wireless setting is considered here. In our setup, the available limited bandwidth resources are allocated in order to serve randomly arriving service demands, which in turn belong to different classes in terms of payload data requirement, delay tolerance, and importance/priority. In addition to heterogeneous traffic, another major challenge stems from random service rates due to time-varying wireless communication channels. Various approaches for scheduling and resource allocation can be used, ranging from simple greedy heuristics and constrained optimization to combinatorics. Those methods are tailored to specific network or application configuration and are usually suboptimal. To this purpose, we resort to deep reinforcement learning (DRL) and propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the aforementioned problem. Furthermore, we present a novel way to use a Dueling Network, which leads to further performance improvement. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against state-of-the-art conventional methods from combinatorics, optimization, and scheduling metrics.
    ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism. (arXiv:2203.15547v3 [cs.CV] UPDATED)
    Convolutional Neural Networks need the construction of informative features, which are determined by channel-wise and spatial-wise information at the network's layers. In this research, we focus on bringing in a novel solution that uses sophisticated optimization for enhancing both the spatial and channel components inside each layer's receptive field. Capsule Networks were used to understand the spatial association between features in the feature map. Standalone capsule networks have shown good results on comparatively simple datasets than on complex datasets as a result of the inordinate amount of feature information. Thus, to tackle this issue, we have proposed ME-CapsNet by introducing deeper convolutional layers to extract important features before passing through modules of capsule layers strategically to improve the performance of the network significantly. The deeper convolutional layer includes blocks of Squeeze-Excitation networks which use a stochastic sampling approach for progressively reducing the spatial size thereby dynamically recalibrating the channels by reconstructing their interdependencies without much loss of important feature information. Extensive experimentation was done using commonly used datasets demonstrating the efficiency of the proposed ME-CapsNet, which clearly outperforms various research works by achieving higher accuracy with minimal model complexity in complex datasets.
    BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning. (arXiv:2203.01522v2 [cs.CV] CROSS LISTED)
    Despite the success of deep neural networks, there are still many challenges in deep representation learning due to the data scarcity issues such as data imbalance, unseen distribution, and domain shift. To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. Specifically, we introduce a batch transformer module or BatchFormer, which is then applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. By doing this, the proposed method enables the collaboration of different samples, e.g., the head-class samples can also contribute to the learning of the tail classes for long-tailed recognition. Furthermore, to mitigate the gap between training and testing, we share the classifier between with or without the BatchFormer during training, which can thus be removed during testing. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning. Code will be made publicly available at https://github.com/zhihou7/BatchFormer.
    Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis. (arXiv:2203.17263v1 [cs.CV])
    Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts. Yet, state-of-the-art approaches still struggle to generate clean, realistic speech without noise artifacts and unnatural distortions in challenging acoustic environments. In this paper, we propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR. Our approach leverages audio-visual speech cues to generate the codes of a neural speech codec, enabling efficient synthesis of clean, realistic speech from noisy signals. Given the importance of speaker-specific cues in speech, we focus on developing personalized models that work well for individual speakers. We demonstrate the efficacy of our approach on a new audio-visual speech dataset collected in an unconstrained, large vocabulary setting, as well as existing audio-visual datasets, outperforming speech enhancement baselines on both quantitative metrics and human evaluation studies. Please see the supplemental video for qualitative results at https://github.com/facebookresearch/facestar/releases/download/paper_materials/video.mp4.
    CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition. (arXiv:2203.17023v1 [cs.SD])
    Previous research has looked into ways to improve speech emotion recognition (SER) by utilizing both acoustic and linguistic cues of speech. However, the potential association between state-of-the-art ASR models and the SER task has yet to be investigated. In this paper, we propose a novel channel and temporal-wise attention RNN (CTA-RNN) architecture based on the intermediate representations of pre-trained ASR models. Specifically, the embeddings of a large-scale pre-trained end-to-end ASR encoder contain both acoustic and linguistic information, as well as the ability to generalize to different speakers, making them well suited for downstream SER task. To further exploit the embeddings from different layers of the ASR encoder, we propose a novel CTA-RNN architecture to capture the emotional salient parts of embeddings in both the channel and temporal directions. We evaluate our approach on two popular benchmark datasets, IEMOCAP and MSP-IMPROV, using both within-corpus and cross-corpus settings. Experimental results show that our proposed method can achieve excellent performance in terms of accuracy and robustness.
    Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement. (arXiv:2011.06392v2 [cs.SD] UPDATED)
    Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.
    The paradox of the compositionality of natural language: a neural machine translation case study. (arXiv:2108.05885v2 [cs.CL] UPDATED)
    Obtaining human-like performance in NLP is often argued to require compositional generalisation. Whether neural networks exhibit this ability is usually studied by training models on highly compositional synthetic data. However, compositionality in natural language is much more complex than the rigid, arithmetic-like version such data adheres to, and artificial compositionality tests thus do not allow us to determine how neural models deal with more realistic forms of compositionality. In this work, we re-instantiate three compositionality tests from the literature and reformulate them for neural machine translation (NMT). Our results highlight that: i) unfavourably, models trained on more data are more compositional; ii) models are sometimes less compositional than expected, but sometimes more, exemplifying that different levels of compositionality are required, and models are not always able to modulate between them correctly; iii) some of the non-compositional behaviours are mistakes, whereas others reflect the natural variation in data. Apart from an empirical study, our work is a call to action: we should rethink the evaluation of compositionality in neural networks and develop benchmarks using real data to evaluate compositionality on natural language, where composing meaning is not as straightforward as doing the math.
    Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). (arXiv:2203.13366v2 [cs.IR] UPDATED)
    For a long period, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a review generation method. To deal with such issues, considering that language grounding is a powerful medium to describe and represent various problems or tasks, we present a flexible and unified text-to-text paradigm called "Pretrain, Personalized Prompt, and Predict Paradigm" (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user-item interactions, item metadata, and user reviews are converted to a common format -- natural language sequences. The rich information from natural language assist P5 to capture deeper semantics for recommendation. P5 learns different tasks with the same language modeling objective during pretraining. Thus, it possesses the potential to serve as the foundation model for downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation, which will revolutionize the technical form of recommender system towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several recommendation benchmarks, we conduct experiments to show the effectiveness of our generative approach. We will release our prompts and pretrained P5 language model to help advance future research on Recommendation as Language Processing (RLP) and Personalized Foundation Models.
    DiGS : Divergence guided shape implicit neural representation for unoriented point clouds. (arXiv:2106.10811v2 [cs.CV] UPDATED)
    Shape implicit neural representations (INRs) have recently shown to be effective in shape analysis and reconstruction tasks. Existing INRs require point coordinates to learn the implicit level sets of the shape. When a normal vector is available for each point, a higher fidelity representation can be learned, however normal vectors are often not provided as raw data. Furthermore, the method's initialization has been shown to play a crucial role for surface reconstruction. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal INRs that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and shape space learning and show SOTA performance compared to other unoriented methods. Code and model parameters available at our project page https://chumbyte.github.io/DiGS-Site/.
    Schema matching using Gaussian mixture models with Wasserstein distance. (arXiv:2111.14244v2 [cs.LG] UPDATED)
    Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
    DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools. (arXiv:2203.17275v1 [cs.LG])
    We consider the problem of sequential robotic manipulation of deformable objects using tools. Previous works have shown that differentiable physics simulators provide gradients to the environment state and help trajectory optimization to converge orders of magnitude faster than model-free reinforcement learning algorithms for deformable object manipulation. However, such gradient-based trajectory optimization typically requires access to the full simulator states and can only solve short-horizon, single-skill tasks due to local optima. In this work, we propose a novel framework, named DiffSkill, that uses a differentiable physics simulator for skill abstraction to solve long-horizon deformable object manipulation tasks from sensory observations. In particular, we first obtain short-horizon skills using individual tools from a gradient-based optimizer, using the full state information in a differentiable simulator; we then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input. Finally, we plan over the skills by finding the intermediate goals and then solve long-horizon tasks. We show the advantages of our method in a new set of sequential deformable object manipulation tasks compared to previous reinforcement learning algorithms and compared to the trajectory optimizer.
    A statistical framework for efficient out of distribution detection in deep neural networks. (arXiv:2102.12967v3 [cs.LG] UPDATED)
    Background. Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn from a distribution similar to that of the training set. However, DNNs' predictions are brittle and unreliable when the test samples are drawn from a dissimilar distribution. This is a major concern for deployment in real-world applications, where such behavior may come at a considerable cost, such as industrial production lines, autonomous vehicles, or healthcare applications. Contributions. We frame Out Of Distribution (OOD) detection in DNNs as a statistical hypothesis testing problem. Tests generated within our proposed framework combine evidence from the entire network. Unlike previous OOD detection heuristics, this framework returns a $p$-value for each test sample. It is guaranteed to maintain the Type I Error (T1E - incorrectly predicting OOD for an actual in-distribution sample) for test data. Moreover, this allows to combine several detectors while maintaining the T1E. Building on this framework, we suggest a novel OOD procedure based on low-order statistics. Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.
    Weighted Programming. (arXiv:2202.07577v2 [cs.PL] UPDATED)
    We study weighted programming, a programming paradigm for specifying mathematical models. More specifically, the weighted programs we investigate are like usual imperative programs with two additional features: (1) nondeterministic branching and (2) weighting execution traces. Weights can be numbers but also other objects like words from an alphabet, polynomials, formal power series, or cardinal numbers. We argue that weighted programming as a paradigm can be used to specify mathematical models beyond probability distributions (as is done in probabilistic programming). We develop weakest-precondition- and weakest-liberal-precondition-style calculi \`{a} la Dijkstra for reasoning about mathematical models specified by weighted programs. We present several case studies. For instance, we use weighted programming to model the ski rental problem - an optimization problem. We model not only the optimization problem itself, but also the best deterministic online algorithm for solving this problem as weighted programs. By means of weakest-precondition-style reasoning, we can determine the competitive ratio of the online algorithm on source code level.
    Hypergraph Convolutional Networks via Equivalency between Hypergraphs and Undirected Graphs. (arXiv:2203.16939v1 [cs.LG])
    As a powerful tool for modeling complex relationships, hypergraphs are gaining popularity from the graph learning community. However, commonly used frameworks in deep hypergraph learning focus on hypergraphs with \textit{edge-independent vertex weights}(EIVWs), without considering hypergraphs with \textit{edge-dependent vertex weights} (EDVWs) that have more modeling power. To compensate for this, in this paper, we present General Hypergraph Spectral Convolution(GHSC), a general learning framework that not only can handle EDVW and EIVW hypergraphs, but more importantly, enables theoretically explicitly utilizing the existing powerful Graph Convolutional Neural Networks (GCNNs) such that largely ease the design of Hypergraph Neural Networks. In this framework, the graph Laplacian of the given undirected GCNNs is replaced with a unified hypergraph Laplacian that incorporates vertex weight information from a random walk perspective by equating our defined generalized hypergraphs with simple undirected graphs. Extensive experiments from various domains including social network analysis, visual objective classification, protein learning demonstrate that the proposed framework can achieve state-of-the-art performance.
    Model Agnostic Defence against Backdoor Attacks in Machine Learning. (arXiv:1908.02203v3 [cs.LG] UPDATED)
    Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user's trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.
    How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications. (arXiv:2203.16822v1 [eess.AS])
    Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases (i.e., domain shift). We target this scenario by analyzing the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR for a completely unseen domain, i.e., air traffic control (ATC) communications. We benchmark the proposed models on four challenging ATC test sets (signal-to-noise ratio varies between 5 to 20 dB). Relative word error rate (WER) reduction between 20% to 40% are obtained in comparison to hybrid-based state-of-the-art ASR baselines by fine-tuning E2E acoustic models with a small fraction of labeled data. We also study the impact of fine-tuning data size on WERs, going from 5 minutes (few-shot) to 15 hours.
    Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart. (arXiv:2105.14785v4 [cs.LG] UPDATED)
    Correctly classifying adversarial examples is an essential but challenging requirement for safely deploying machine learning models. As reported in RobustBench, even the state-of-the-art adversarially trained models struggle to exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. This intriguing property sheds light on using coupling strategies to better detect and reject adversarial examples. We evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks including adaptive ones, and demonstrate that the RR module is compatible with different adversarial training frameworks on improving robustness, with little extra computation. The code is available at https://github.com/P2333/Rectified-Rejection.
    A Single-Timescale Method for Stochastic Bilevel Optimization. (arXiv:2102.04671v4 [math.OC] UPDATED)
    Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
    DeepFry: Identifying Vocal Fry Using Deep Neural Networks. (arXiv:2203.17019v1 [eess.AS])
    Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly for languages where creak is frequently used. This paper proposes a deep learning model to detect creaky voice in fluent speech. The model is composed of an encoder and a classifier trained together. The encoder takes the raw waveform and learns a representation using a convolutional neural network. The classifier is implemented as a multi-headed fully-connected network trained to detect creaky voice, voicing, and pitch, where the last two are used to refine creak prediction. The model is trained and tested on speech of American English speakers, annotated for creak by trained phoneticians. We evaluated the performance of our system using two encoders: one is tailored for the task, and the other is based on a state-of-the-art unsupervised representation. Results suggest our best-performing system has improved recall and F1 scores compared to previous methods on unseen data.
    Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain. (arXiv:2203.17004v1 [eess.AS])
    Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations, thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using SGMs for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.
    Generation and Simulation of Synthetic Datasets with Copulas. (arXiv:2203.17250v1 [cs.LG])
    This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.
    Continuous Scene Representations for Embodied AI. (arXiv:2203.17251v1 [cs.CV])
    We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation. Our key insight is to embed pair-wise relationships between objects in a latent space. This allows for a richer representation compared to discrete relations (e.g., [support], [next-to]) commonly used for building scene representations. CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations. Using CSR, we outperform state-of-the-art approaches for the challenging downstream task of visual room rearrangement, without any task specific training. Moreover, we show the learned embeddings capture salient spatial details of the scene and show applicability to real world data. A summery video and code is available at https://prior.allenai.org/projects/csr.
    LEAD1.0: A Large-scale Annotated Dataset for Energy Anomaly Detection in Commercial Buildings. (arXiv:2203.17256v1 [cs.LG])
    Modern buildings are densely equipped with smart energy meters, which periodically generate a massive amount of time-series data yielding few million data points every day. This data can be leveraged to discover the underlying loads, infer their energy consumption patterns, inter-dependencies on environmental factors, and the building's operational properties. Furthermore, it allows us to simultaneously identify anomalies present in the electricity consumption profiles, which is a big step towards saving energy and achieving global sustainability. However, to date, the lack of large-scale annotated energy consumption datasets hinders the ongoing research in anomaly detection. We contribute to this effort by releasing a well-annotated version of a publicly available ASHRAE Great Energy Predictor III data set containing 1,413 smart electricity meter time series spanning over one year. In addition, we benchmark the performance of eight state-of-the-art anomaly detection methods on our dataset and compare their performance.
    Learning from many trajectories. (arXiv:2203.17193v1 [cs.LG])
    We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. They do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given $n$-dimensional examples produced by $m$ trajectories, each of length $T$, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we establish that the worst-case error rate this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp) lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent altogether, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.
    Training strategy for a lightweight countermeasure model for automatic speaker verification. (arXiv:2203.17031v1 [cs.SD])
    The countermeasure (CM) model is developed to protect Automatic Speaker Verification (ASV) systems from spoof attacks and prevent resulting personal information leakage. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud- based systems. This work proposes training strategies for a lightweight CM model for ASV, using generalized end- to-end (GE2E) pre-training and adversarial fine-tuning to improve performance, and applying knowledge distillation (KD) to reduce the size of the CM model. In the evalua- tion phase of the ASVspoof 2021 Logical Access task, the lightweight ResNetSE model reaches min t-DCF 0.2695 and EER 3.54%. Compared to the teacher model, the lightweight student model only uses 22.5% of parameters and 21.1% of multiply and accumulate operands of the teacher model.
    CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. (arXiv:2203.17196v1 [cs.SE])
    Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic CATegorizer of ISSue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug reports, Enhancement/feature requests, and Questions. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss, and evaluating the model are publicly available.
    DINE: Domain Adaptation from Single and Multiple Black-box Predictors. (arXiv:2104.01539v3 [cs.CV] UPDATED)
    To ease the burden of labeling, unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite impressive progress, prior methods always need to access the raw source data and develop data-dependent alignment approaches to recognize the target samples in a transductive learning manner, which may raise privacy concerns from source individuals. Several recent studies resort to an alternative solution by exploiting the well-trained white-box model from the source domain, yet, it may still leak the raw data through generative adversarial learning. This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). Taking into consideration the target data structure, DINE first distills the knowledge from the source predictor to a customized target model, then fine-tunes the distilled model to further fit the target domain. Besides, neural networks are not required to be identical across domains in DINE, even allowing effective adaptation on a low-resource device. Empirical results on three UDA scenarios (i.e., single-source, multi-source, and partial-set) confirm that DINE achieves highly competitive performance compared to state-of-the-art data-dependent approaches. Code is available at \url{https://github.com/tim-learn/DINE/}.
    Neural Q-learning for solving elliptic PDEs. (arXiv:2203.17128v1 [math.NA])
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units $\rightarrow \infty$. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in $L^2$ to the PDE solution as the training time $\rightarrow \infty$. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs.
    Ternary and Binary Quantization for Improved Classification. (arXiv:2203.16798v1 [cs.CV])
    Dimension reduction and data quantization are two important methods for reducing data complexity. In the paper, we study the methodology of first reducing data dimension by random projection and then quantizing the projections to ternary or binary codes, which has been widely applied in classification. Usually, the quantization will seriously degrade the accuracy of classification due to high quantization errors. Interestingly, however, we observe that the quantization could provide comparable and often superior accuracy, as the data to be quantized are sparse features generated with common filters. Furthermore, this quantization property could be maintained in the random projections of sparse features, if both the features and random projection matrices are sufficiently sparse. By conducting extensive experiments, we validate and analyze this intriguing property.
    Performative Power. (arXiv:2203.17232v1 [cs.LG])
    We introduce the notion of performative power, which measures the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to steer a population. We relate performative power to the economic theory of market power. Traditional economic concepts are well known to struggle with identifying anti-competitive patterns in digital platforms--a core challenge is the difficulty of defining the market, its participants, products, and prices. Performative power sidesteps the problem of market definition by focusing on a directly observable statistical measure instead. High performative power enables a platform to profit from steering participant behavior, whereas low performative power ensures that learning from historical data is close to optimal. Our first general result shows that under low performative power, a firm cannot do better than standard supervised learning on observed data. We draw an analogy with a firm being a price-taker, an economic condition that arises under perfect competition in classical market models. We then contrast this with a market where performative power is concentrated and show that the equilibrium state can differ significantly. We go on to study performative power in a concrete setting of strategic classification where participants can switch between competing firms. We show that monopolies maximize performative power and disutility for the participant, while competition and outside options decrease performative power. We end on a discussion of connections to measures of market power in economics and of the relationship with ongoing antitrust debates.
    Factored Adaptation for Non-Stationary Reinforcement Learning. (arXiv:2203.16582v1 [cs.LG])
    Dealing with non-stationarity in environments (i.e., transition dynamics) and objectives (i.e., reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). Most existing approaches only focus on families of stationary MDPs, in which the non-stationarity is episodic, i.e., the change is only possible across episodes. The few works that do consider non-stationarity without a specific boundary, i.e., also allow for changes within an episode, model the changes monolithically in a single shared embedding vector. In this paper, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that explicitly learns the individual latent change factors affecting the transition dynamics and reward functions. FANS-RL learns jointly the structure of a factored MDP and a factored representation of the time-varying change factors, as well as the specific state components that they affect, via a factored non-stationary variational autoencoder. Through this general framework, we can consider general non-stationary scenarios with different changing function types and changing frequency. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of rewards, compactness of the latent state representation and robustness to varying degrees of non-stationarity.
    A Closer Look at Rehearsal-Free Continual Learning. (arXiv:2203.17269v1 [cs.LG])
    Continual learning describes a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes (a phenomenon known as the catastrophic forgetting problem) which may disappear from the training data for extended periods of time. Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a sharp cost to memory and computation, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation, feature distillation, L2 parameter regularization, and EWC parameter regularization. We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task. Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation. We then highlight the impact of the rehearsal-free continual learning settings with a classifier expansion benchmark, showing that a strategy based on our findings combined with a positive/negative label balancing heuristic can close the performance gap between the upper bound and the existing strategies by up to roughly 50%. Finally, we show that a simple method consisting of pre-training, L2 regularization, and prediction distillation can even outperform rehearsal-based methods on the common CIFAR-100 benchmark.
    RobIn: A Robust Interpretable Deep Network for Schizophrenia Diagnosis. (arXiv:2203.17085v1 [cs.LG])
    Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application gap - it is difficult to apply lab research to the real world. We propose to reduce this training-application gap by focusing on readily accessible data. We collect a data set of psychiatric observations of patients based on DSM-5 criteria. Because similar data is already recorded in all mental health clinics that diagnose schizophrenia using DSM-5, our method could be easily integrated into current processes as a tool to assist clinicians, whilst abiding by formal diagnostic criteria. To facilitate real-world usage of our system, we show that it is interpretable and robust. Understanding how a machine learning tool reaches its diagnosis is essential to allow clinicians to trust that diagnosis. To interpret the framework, we fuse two complementary attention mechanisms, 'squeeze and excitation' and 'self-attention', to determine global attribute importance and attribute interactivity, respectively. The model uses these importance scores to make decisions. This allows clinicians to understand how a diagnosis was reached, improving trust in the model. Because machine learning models often struggle to generalise to data from different sources, we perform experiments with augmented test data to evaluate the model's applicability to the real world. We find that our model is more robust to perturbations, and should therefore perform better in a clinical setting. It achieves 98% accuracy with 10-fold cross-validation.
    Cross-modal Learning of Graph Representations using Radar Point Cloud for Long-Range Gesture Recognition. (arXiv:2203.17066v1 [eess.SP])
    Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction. Radar sensors possess multiple intrinsic properties, such as their ability to work in low illumination, harsh weather conditions, and being low-cost and compact, making them highly preferable for a gesture recognition solution. However, most literature work focuses on solutions with a limited range that is lower than a meter. We propose a novel architecture for a long-range (1m - 2m) gesture recognition solution that leverages a point cloud-based cross-learning approach from camera point cloud to 60-GHz FMCW radar point cloud, which allows learning better representations while suppressing noise. We use a variant of Dynamic Graph CNN (DGCNN) for the cross-learning, enabling us to model relationships between the points at a local and global level and to model the temporal dynamics a Bi-LSTM network is employed. In the experimental results section, we demonstrate our model's overall accuracy of 98.4% for five gestures and its generalization capability.
    PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. (arXiv:2203.16965v1 [cs.CL])
    While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.
    Mutual information estimation for graph convolutional neural networks. (arXiv:2203.16887v1 [cs.LG])
    Measuring model performance is a key issue for deep learning practitioners. However, we often lack the ability to explain why a specific architecture attains superior predictive accuracy for a given data set. Often, validation accuracy is used as a performance heuristic quantifying how well a network generalizes to unseen data, but it does not capture anything about the information flow in the model. Mutual information can be used as a measure of the quality of internal representations in deep learning models, and the information plane may provide insights into whether the model exploits the available information in the data. The information plane has previously been explored for fully connected neural networks and convolutional architectures. We present an architecture-agnostic method for tracking a network's internal representations during training, which are then used to create the mutual information plane. The method is exemplified for graph-based neural networks fitted on citation data. We compare how the inductive bias introduced in graph-based architectures changes the mutual information plane relative to a fully connected neural network.
    Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback. (arXiv:2203.17118v1 [cs.LG])
    Clicks on rankings suffer from position bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based Learning-to-Rank (LTR) is based on counterfactual Inverse-Propensity-Scoring (IPS) estimation. Unique about LTR is the fact that standard Doubly-Robust (DR) estimation - which combines IPS with regression predictions - is inapplicable since the treatment variable - indicating whether a user examined an item - cannot be observed in the data. In this paper, we introduce a novel DR estimator that uses the expectation of treatment per rank instead. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.
    Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding. (arXiv:2203.16487v2 [cs.CL] UPDATED)
    In this paper, we propose Generalized Aggressive Decoding (GAD) -- a novel approach to accelerating autoregressive translation with no quality loss, through the collaboration of autoregressive and non-autoregressive translation (NAT) of the Transformer. At each decoding iteration, GAD aggressively decodes a number of tokens in parallel as a draft through NAT and then verifies them in the autoregressive manner, where only the tokens that pass the verification are kept as decoded tokens. GAD can achieve the same performance as autoregressive translation but much more efficiently because both NAT drafting and autoregressive verification are fast due to parallel computing. We conduct experiments in the WMT14 English-German translation task and confirm that the vanilla GAD yields exactly the same results as greedy decoding with an around 3x speedup, and that its variant (GAD++) with an advanced verification strategy not only outperforms the greedy translation and even achieves the comparable translation quality with the beam search result, but also further improves the decoding speed, resulting in an around 5x speedup over autoregressive translation. Our models and codes are available at https://github.com/hemingkx/Generalized-Aggressive-Decoding.
    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$. (arXiv:2203.17189v1 [cs.LG])
    Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.
    Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. (arXiv:2203.17113v1 [cs.SD])
    This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language modeling in encoder output, like HuBERT model, while the other lets the decoder learn to reconstruct pseudo codes autoregressively instead of generating textual scripts. In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text. Comprehensive experiments on the LibriSpeech corpus show that the proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training, and also outperforms significantly the state-of-the-art wav2vec 2.0 and HuBERT on fine-tuning subsets of 10h and 100h.
    Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors. (arXiv:2203.17138v1 [cs.RO])
    We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.
    Quantum-Aided Meta-Learning for Bayesian Binary Neural Networks via Born Machines. (arXiv:2203.17089v1 [quant-ph])
    Near-term noisy intermediate-scale quantum circuits can efficiently implement implicit probabilistic models in discrete spaces, supporting distributions that are practically infeasible to sample from using classical means. One of the possible applications of such models, also known as Born machines, is probabilistic inference, which is at the core of Bayesian methods. This paper studies the use of Born machines for the problem of training binary Bayesian neural networks. In the proposed approach, a Born machine is used to model the variational distribution of the binary weights of the neural network, and data from multiple tasks is used to reduce training data requirements on new tasks. The method combines gradient-based meta-learning and variational inference via Born machines, and is shown in a prototypical regression problem to outperform conventional joint learning strategies.
    Traffic4cast at NeurIPS 2021 - Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes. (arXiv:2203.17070v1 [cs.LG])
    The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input.
    SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. (arXiv:2203.17001v1 [eess.AS])
    Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.
    WavThruVec: Latent speech representation as intermediate features for neural speech synthesis. (arXiv:2203.16930v1 [cs.SD])
    Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizing low-level intermediate speech representation such as mel-spectrograms. However, such predetermined features are fundamentally limited, because they do not allow to exploit the full potential of a data-driven approach through learning hidden representations. For this reason, several end-to-end methods have been proposed. However, such models are harder to train and require a large number of high-quality recordings with transcriptions. Here, we propose WavThruVec - a two-stage architecture that resolves the bottleneck by using high-dimensional Wav2Vec 2.0 embeddings as intermediate speech representation. Since these hidden activations provide high-level linguistic features, they are more robust to noise. That allows us to utilize annotated speech datasets of a lower quality to train the first-stage module. At the same time, the second-stage component can be trained on large-scale untranscribed audio corpora, as Wav2Vec 2.0 embeddings are time-aligned and speaker-independent. This results in an increased generalization capability to out-of-vocabulary words, as well as to a better generalization to unseen speakers. We show that the proposed model not only matches the quality of state-of-the-art neural models, but also presents useful properties enabling tasks like voice conversion or zero-shot synthesis.
    To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo. (arXiv:2203.16682v1 [cs.CV])
    We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image. Naturally, models trained on these biased data lead to over-estimation of performance on the benchmark. To enforce models being correct for the correct reasons, we design automated tools to filter and debias the original dataset by ruling out all examples of insufficient context, such as those with no verb or with a long chain of conjunct names in their captions. Our experiments show that our new sub-sampled dataset contains less bias with much lowered heuristic performances and widened gaps between heuristic and supervised methods. We also demonstrate the same benchmark model trained on our debiased training set outperforms that trained on the original biased (and larger) training set on our debiased test set. We argue our debiased dataset offers the PCVG task a more practical baseline for reliable benchmarking and future improvements.
    ESGBERT: Language Model to Help with Classification Tasks Related to Companies Environmental, Social, and Governance Practices. (arXiv:2203.16788v1 [cs.CL])
    Environmental, Social, and Governance (ESG) are non-financial factors that are garnering attention from investors as they increasingly look to apply these as part of their analysis to identify material risks and growth opportunities. Some of this attention is also driven by clients who, now more aware than ever, are demanding for their money to be managed and invested responsibly. As the interest in ESG grows, so does the need for investors to have access to consumable ESG information. Since most of it is in text form in reports, disclosures, press releases, and 10-Q filings, we see a need for sophisticated NLP techniques for classification tasks for ESG text. We hypothesize that an ESG domain-specific pre-trained model will help with such and study building of the same in this paper. We explored doing this by fine-tuning BERTs pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task. We were able to achieve accuracy better than the original BERT and baseline models in environment-specific classification tasks.
    Flat-topped Probability Density Functions for Mixture Models. (arXiv:2203.17027v1 [cs.LG])
    This paper investigates probability density functions (PDFs) that are continuous everywhere, nearly uniform around the mode of distribution, and adaptable to a variety of distribution shapes ranging from bell-shaped to rectangular. From the viewpoint of computational tractability, the PDF based on the Fermi-Dirac or logistic function is advantageous in estimating its shape parameters. The most appropriate PDF for $n$-variate distribution is of the form: $p\left(\mathbf{x}\right)\propto\left[\cosh\left(\left[\left(\mathbf{x}-\mathbf{m}\right)^{\mathsf{T}}\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\mathbf{m}\right)\right]^{n/2}\right)+\cosh\left(r^{n}\right)\right]^{-1}$ where $\mathbf{x},\mathbf{m}\in\mathbb{R}^{n}$, $\boldsymbol{\Sigma}$ is an $n\times n$ positive definite matrix, and $r>0$ is a shape parameter. The flat-topped PDFs can be used as a component of mixture models in machine learning to improve goodness of fit and make a model as simple as possible.
    Multimodal Fusion Transformer for Remote Sensing Image Classification. (arXiv:2203.16952v1 [cs.CV])
    Vision transformer (ViT) has been trending in image classification tasks due to its promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViT models in hyperspectral image (HSI) classification tasks, but without achieving satisfactory performance. To this paper, we introduce a new multimodal fusion transformer (MFT) network for HSI land-cover classification, which utilizes other sources of multimodal data in addition to HSI. Instead of using conventional feature fusion techniques, other multimodal data are used as an external classification (CLS) token in the transformer encoder, which helps achieving better generalization. ViT and other similar transformer models use a randomly initialized external classification token {and fail to generalize well}. However, the use of a feature embedding derived from other sources of multimodal data, such as light detection and ranging (LiDAR), offers the potential to improve those models by means of a CLS. The concept of tokenization is used in our work to generate CLS and HSI patch tokens, helping to learn key features in a reduced feature space. We also introduce a new attention mechanism for improving the exchange of information between HSI tokens and the CLS (e.g., LiDAR) token. Extensive experiments are carried out on widely used and benchmark datasets i.e., the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. In the results section, we compare the proposed MFT model with other state-of-the-art transformer models, classical CNN models, as well as conventional classifiers. The superior performance achieved by the proposed model is due to the use of multimodal information as external classification tokens.
    HiFi-VC: High Quality ASR-Based Voice Conversion. (arXiv:2203.16937v1 [cs.SD])
    The goal of voice conversion (VC) is to convert input voice to match the target speaker's voice while keeping text and prosody intact. VC is usually used in entertainment and speaking-aid systems, as well as applied for speech data generation and augmentation. The development of any-to-any VC systems, which are capable of generating voices unseen during model training, is of particular interest to both researchers and the industry. Despite recent progress, any-to-any conversion quality is still inferior to natural speech. In this work, we propose a new any-to-any voice conversion pipeline. Our approach uses automated speech recognition (ASR) features, pitch tracking, and a state-of-the-art waveform prediction model. According to multiple subjective and objective evaluations, our method outperforms modern baselines in terms of voice quality, similarity and consistency.
    It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher. (arXiv:2203.17008v1 [cs.CV])
    Model quantization is considered as a promising method to greatly reduce the resource requirements of deep neural networks. To deal with the performance drop induced by quantization errors, a popular method is to use training data to fine-tune quantized networks. In real-world environments, however, such a method is frequently infeasible because training data is unavailable due to security, privacy, or confidentiality concerns. Zero-shot quantization addresses such problems, usually by taking information from the weights of a full-precision teacher network to compensate the performance drop of the quantized networks. In this paper, we first analyze the loss surface of state-of-the-art zero-shot quantization techniques and provide several findings. In contrast to usual knowledge distillation problems, zero-shot quantization often suffers from 1) the difficulty of optimizing multiple loss terms together, and 2) the poor generalization capability due to the use of synthetic samples. Furthermore, we observe that many weights fail to cross the rounding threshold during training the quantized networks even when it is necessary to do so for better performance. Based on the observations, we propose AIT, a simple yet powerful technique for zero-shot quantization, which addresses the aforementioned two problems in the following way: AIT i) uses a KL distance loss only without a cross-entropy loss, and ii) manipulates gradients to guarantee that a certain portion of weights are properly updated after crossing the rounding thresholds. Experiments show that AIT outperforms the performance of many existing methods by a great margin, taking over the overall state-of-the-art position in the field.
    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation. (arXiv:2203.16687v1 [cs.LG])
    Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality.
    The ideal data compression and automatic discovery of hidden law using neural network. (arXiv:2203.16941v1 [cs.LG])
    Recently machine learning using neural networks has been developed, and many new methods have been suggested. On the other hand, a system that has true versatility has not been developed, and there remain many fields in which the human brain has advantages over machine learning. We considered how the human brain recognizes events and memorizes them and succeeded to reproduce the system of the human brain on a machine learning model with a new autoencoder neural network (NN). The previous autoencoders have the problem that they cannot define well what is the features of the input data, and we need to restrict the middle layer of the autoencoder artificially. We solve this problem by defining a new loss function that reflects the information entropy, and it enables the NN to compress the input data ideally and automatically discover the hidden law behind the input data set. The loss function used in our NN is based on the free-energy principle which is known as the unified brain theory, and our study is the first concrete formularization of this principle. The result of this study can be applied to any kind of data analysis and also to cognitive science.
    An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks. (arXiv:2203.16773v1 [eess.AS])
    Speech representations learned from Self-supervised learning (SSL) models have been found beneficial for various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. On the other hand, prompting in Natural Language Processing (NLP) is an efficient and widely used technique to leverage pre-trained language models (LMs). Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper.
    Certified machine learning: A posteriori error estimation for physics-informed neural networks. (arXiv:2203.17055v1 [cs.LG])
    Physics-informed neural networks (PINNs) are one popular approach to introduce a priori knowledge about physical systems into the learning framework. PINNs are known to be robust for smaller training sets, derive better generalization problems, and are faster to train. In this paper, we show that using PINNs in comparison with purely data-driven neural networks is not only favorable for training performance but allows us to extract significant information on the quality of the approximated solution. Assuming that the underlying differential equation for the PINN training is an ordinary differential equation, we derive a rigorous upper limit on the PINN prediction error. This bound is applicable even for input data not included in the training phase and without any prior knowledge about the true solution. Therefore, our a posteriori error estimation is an essential step to certify the PINN. We apply our error estimator exemplarily to two academic toy problems, whereof one falls in the category of model-predictive control and thereby shows the practical use of the derived results.
    Exploiting Explainable Metrics for Augmented SGD. (arXiv:2203.16723v1 [cs.LG])
    Explaining the generalization characteristics of deep learning is an emerging topic in advanced machine learning. There are several unanswered questions about how learning under stochastic optimization really works and why certain strategies are better than others. In this paper, we address the following question: \textit{can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer?} With this question in mind, we propose new explainability metrics that measure the redundant information in a network's layers using a low-rank factorization framework and quantify a complexity measure that is highly correlated with the generalization performance of a given optimizer, network, and dataset. We subsequently exploit these metrics to augment the Stochastic Gradient Descent (SGD) optimizer by adaptively adjusting the learning rate in each layer to improve in generalization performance. Our augmented SGD -- dubbed RMSGD -- introduces minimal computational overhead compared to SOTA methods and outperforms them by exhibiting strong generalization characteristics across application, architecture, and dataset.
    Assessing the risk of re-identification arising from an attack on anonymised data. (arXiv:2203.16921v1 [cs.LG])
    Objective: The use of routinely-acquired medical data for research purposes requires the protection of patient confidentiality via data anonymisation. The objective of this work is to calculate the risk of re-identification arising from a malicious attack to an anonymised dataset, as described below. Methods: We first present an analytical means of estimating the probability of re-identification of a single patient in a k-anonymised dataset of Electronic Health Record (EHR) data. Second, we generalize this solution to obtain the probability of multiple patients being re-identified. We provide synthetic validation via Monte Carlo simulations to illustrate the accuracy of the estimates obtained. Results: The proposed analytical framework for risk estimation provides re-identification probabilities that are in agreement with those provided by simulation in a number of scenarios. Our work is limited by conservative assumptions which inflate the re-identification probability. Discussion: Our estimates show that the re-identification probability increases with the proportion of the dataset maliciously obtained and that it has an inverse relationship with the equivalence class size. Our recursive approach extends the applicability domain to the general case of a multi-patient re-identification attack in an arbitrary k-anonymisation scheme. Conclusion: We prescribe a systematic way to parametrize the k-anonymisation process based on a pre-determined re-identification probability. We observed that the benefits of a reduced re-identification risk that come with increasing k-size may not be worth the reduction in data granularity when one is considering benchmarking the re-identification probability on the size of the portion of the dataset maliciously obtained by the adversary.
    Equivariant Diffusion for Molecule Generation in 3D. (arXiv:2203.17003v1 [cs.LG])
    This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
    Task Adaptive Parameter Sharing for Multi-Task Learning. (arXiv:2203.16708v1 [cs.LG])
    Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. This enables multi-task learning while minimizing resources used and competition between tasks. TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights. Further, a sparsity penalty on the number of active layers encourages weight sharing with the base model. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. Moreover, TAPS is agnostic to the model architecture and requires only minor changes to the training scheme. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
    Adaptive Estimation of Random Vectors with Bandit Feedback. (arXiv:2203.16810v1 [cs.LG])
    We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. This reduces to learning an optimal subset for estimating the entire vector. Towards this, we first establish an exponential concentration bound for an estimate of the MSE for each observable subset. We then frame the estimation problem with bandit feedback in the best-subset identification setting. We propose a variant of the successive elimination algorithm to cater to the adaptive estimation problem, and we derive an upper bound on the sample complexity of this algorithm. In addition, to understand the fundamental limit on the sample complexity of this adaptive estimation bandit problem, we derive a minimax lower bound.
    Generating High Fidelity Data from Low-density Regions using Diffusion Models. (arXiv:2203.17260v1 [cs.CV])
    Our work focuses on addressing sample deficiency from low-density regions of data manifold in common image datasets. We leverage diffusion process based generative models to synthesize novel images from low-density regions. We observe that uniform sampling from diffusion models predominantly samples from high-density regions of the data manifold. Therefore, we modify the sampling process to guide it towards low-density regions while simultaneously maintaining the fidelity of synthetic data. We rigorously demonstrate that our process successfully generates novel high fidelity samples from low-density regions. We further examine generated samples and show that the model does not memorize low-density data and indeed learns to generate novel samples from low-density regions.
    An Empirical Study of Language Model Integration for Transducer based Speech Recognition. (arXiv:2203.16776v1 [eess.AS])
    Utilizing text-only data with an external language model (LM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned ILM prior, in order to integrate the external LM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained ILM. We hypothesize that this setting is appropriate and may deteriorate the performance of the DR method, and propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.
    Towards Driving-Oriented Metric for Lane Detection Models. (arXiv:2203.16851v1 [cs.CV])
    After the 2017 TuSimple Lane Detection Challenge, its dataset and evaluation based on accuracy and F1 score have become the de facto standard to measure the performance of lane detection methods. While they have played a major role in improving the performance of lane detection methods, the validity of this evaluation method in downstream tasks has not been adequately researched. In this study, we design 2 new driving-oriented metrics for lane detection: End-to-End Lateral Deviation metric (E2E-LD) is directly formulated based on the requirements of autonomous driving, a core downstream task of lane detection; Per-frame Simulated Lateral Deviation metric (PSLD) is a lightweight surrogate metric of E2E-LD. To evaluate the validity of the metrics, we conduct a large-scale empirical study with 4 major types of lane detection approaches on the TuSimple dataset and our newly constructed dataset Comma2k19-LD. Our results show that the conventional metrics have strongly negative correlations ($\leq$-0.55) with E2E-LD, meaning that some recent improvements purely targeting the conventional metrics may not have led to meaningful improvements in autonomous driving, but rather may actually have made it worse by overfitting to the conventional metrics. As autonomous driving is a security/safety-critical system, the underestimation of robustness hinders the sound development of practical lane detection models. We hope that our study will help the community achieve more downstream task-aware evaluations for lane detection.
    System Identification via Nuclear Norm Regularization. (arXiv:2203.16673v1 [stat.ML])
    This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads to new bounds on estimating the impulse response and the Hankel matrix associated with the linear system. We first design an input excitation and show that Hankel regularization enables one to recover the system using optimal number of observations in the true system order and achieve strong statistical estimation rates. Surprisingly, we demonstrate that the input design indeed matters, by showing that intuitive choices such as i.i.d. Gaussian input leads to provably sub-optimal sample complexity. To better understand the benefits of regularization, we also revisit the OLS estimator. Besides refining existing bounds, we experimentally identify when regularized approach improves over OLS: (1) For low-order systems with slow impulse-response decay, OLS method performs poorly in terms of sample complexity, (2) Hankel matrix returned by regularization has a more clear singular value gap that ease identification of the system order, (3) Hankel regularization is less sensitive to hyperparameter choice. Finally, we establish model selection guarantees through a joint train-validation procedure where we tune the regularization parameter for near-optimal estimation.
    Predicting extreme events from data using deep machine learning: when and where. (arXiv:2203.17155v1 [cs.LG])
    We develop a deep convolutional neural network (DCNN) based framework for model-free prediction of the occurrence of extreme events both in time ("when") and in space ("where") in nonlinear physical systems of spatial dimension two. The measurements or data are a set of two-dimensional snapshots or images. For a desired time horizon of prediction, a proper labeling scheme can be designated to enable successful training of the DCNN and subsequent prediction of extreme events in time. Given that an extreme event has been predicted to occur within the time horizon, a space-based labeling scheme can be applied to predict, within certain resolution, the location at which the event will occur. We use synthetic data from the 2D complex Ginzburg-Landau equation and empirical wind speed data of the North Atlantic ocean to demonstrate and validate our machine-learning based prediction framework. The trade-offs among the prediction horizon, spatial resolution, and accuracy are illustrated, and the detrimental effect of spatially biased occurrence of extreme event on prediction accuracy is discussed. The deep learning framework is viable for predicting extreme events in the real world.
    Conditional Autoregressors are Interpretable Classifiers. (arXiv:2203.17002v1 [cs.LG])
    We explore the use of class-conditional autoregressive (CA) models to perform image classification on MNIST-10. Autoregressive models assign probability to an entire input by combining probabilities from each individual feature; hence classification decisions made by a CA can be readily decomposed into contributions from each each input feature. That is to say, CA are inherently locally interpretable. Our experiments show that naively training a CA achieves much worse accuracy compared to a standard classifier, however this is due to over-fitting and not a lack of expressive power. Using knowledge distillation from a standard classifier, a student CA can be trained to match the performance of the teacher while still being interpretable.
    Generating Scientific Articles with Machine Learning. (arXiv:2203.16569v1 [cs.LG])
    In recent years, the field of machine learning has seen rapid growth, with applications in a variety of domains, including image recognition, natural language processing, and predictive modeling. In this paper, we explore the application of machine learning to the generation of scientific articles. We present a method for using machine learning to generate scientific articles based on a data set of scientific papers. The method uses a machine-learning algorithm to learn the structure of a scientific article and a set of training data consisting of scientific papers. The machine-learning algorithm is used to generate a scientific article based on the data set of scientific papers. We evaluate the performance of the method by comparing the generated article to a set of manually written articles. The results show that the machine-generated article is of similar quality to the manually written articles.
    Spatially Adaptive Online Prediction of Piecewise Regular Functions. (arXiv:2203.16587v1 [math.ST])
    We consider the problem of estimating piecewise regular functions in an online setting, i.e., the data arrive sequentially and at any round our task is to predict the value of the true function at the next revealed point using the available data from past predictions. We propose a suitably modified version of a recently developed online learning algorithm called the sleeping experts aggregation algorithm. We show that this estimator satisfies oracle risk bounds simultaneously for all local regions of the domain. As concrete instantiations of the expert aggregation algorithm proposed here, we study an online mean aggregation and an online linear regression aggregation algorithm where experts correspond to the set of dyadic subrectangles of the domain. The resulting algorithms are near linear time computable in the sample size. We specifically focus on the performance of these online algorithms in the context of estimating piecewise polynomial and bounded variation function classes in the fixed design setup. The simultaneous oracle risk bounds we obtain for these estimators in this context provide new and improved (in certain aspects) guarantees even in the batch setting and are not available for the state of the art batch learning estimators.
    Robust Meta-Reinforcement Learning with Curriculum-Based Task Sampling. (arXiv:2203.16801v1 [cs.LG])
    Meta-reinforcement learning (meta-RL) acquires meta-policies that show good performance for tasks in a wide task distribution. However, conventional meta-RL, which learns meta-policies by randomly sampling tasks, has been reported to show meta-overfitting for certain tasks, especially for easy tasks where an agent can easily get high scores. To reduce effects of the meta-overfitting, we considered meta-RL with curriculum-based task sampling. Our method is Robust Meta Reinforcement Learning with Guided Task Sampling (RMRL-GTS), which is an effective method that restricts task sampling based on scores and epochs. We show that in order to achieve robust meta-RL, it is necessary not only to intensively sample tasks with poor scores, but also to restrict and expand the task regions of the tasks to be sampled.
    Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks. (arXiv:2203.16777v1 [cs.CV])
    We present Mask Atari, a new benchmark to help solve partially observable Markov decision process (POMDP) problems with Deep Reinforcement Learning (DRL)-based approaches. To achieve a simulation environment for the POMDP problems, Mask Atari is constructed based on Atari 2600 games with controllable, moveable, and learnable masks as the observation area for the target agent, especially with the active information gathering (AIG) setting in POMDPs. Given that one does not yet exist, Mask Atari provides a challenging, efficient benchmark for evaluating the methods that focus on the above problem. Moreover, the mask operation is a trial for introducing the receptive field in the human vision system into a simulation environment for an agent, which means the evaluations are not biased from the sensing ability and purely focus on the cognitive performance of the methods when compared with the human baseline. We describe the challenges and features of our benchmark and evaluate several baselines with Mask Atari.
    Learning the Effect of Registration Hyperparameters with HyperMorph. (arXiv:2203.16680v1 [cs.CV])
    We introduce HyperMorph, a framework that facilitates efficient hyperparameter tuning in learning-based deformable image registration. Classical registration algorithms perform an iterative pair-wise optimization to compute a deformation field that aligns two images. Recent learning-based approaches leverage large image datasets to learn a function that rapidly estimates a deformation for a given image pair. In both strategies, the accuracy of the resulting spatial correspondences is strongly influenced by the choice of certain hyperparameter values. However, an effective hyperparameter search consumes substantial time and human effort as it often involves training multiple models for different fixed hyperparameter values and may lead to suboptimal registration. We propose an amortized hyperparameter learning strategy to alleviate this burden by learning the impact of hyperparameters on deformation fields. We design a meta network, or hypernetwork, that predicts the parameters of a registration network for input hyperparameters, thereby comprising a single model that generates the optimal deformation field corresponding to given hyperparameter values. This strategy enables fast, high-resolution hyperparameter search at test-time, reducing the inefficiency of traditional approaches while increasing flexibility. We also demonstrate additional benefits of HyperMorph, including enhanced robustness to model initialization and the ability to rapidly identify optimal hyperparameter values specific to a dataset, image contrast, task, or even anatomical region, all without the need to retrain models. We make our code publicly available at this http URL
    MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. (arXiv:2203.16691v1 [eess.AS])
    In this paper, we propose a simple yet powerful improvement over the recent Self-Supervised Audio Spectrogram Transformer (SSAST) model for speech and audio classification. Specifically, we leverage the insight that the SSAST uses a very high masking ratio (75%) during pretraining, meaning that the vast majority of self-attention compute is performed on mask tokens. We address this by integrating the encoder-decoder architecture from Masked Autoencoders are Scalable Vision Learners (MAE) into the SSAST, where a deep encoder operates on only unmasked input, and a shallow decoder operates on encoder outputs and mask tokens. We find that MAE-like pretraining can provide a 3x speedup and 2x memory usage reduction over the vanilla SSAST using current audio pretraining strategies with ordinary model and input sizes. When fine-tuning on downstream tasks, which only uses the encoder, we find that our approach outperforms the SSAST on a variety of downstream tasks. We further conduct comprehensive evaluations into different strategies of pretraining and explore differences in MAE-style pretraining between the visual and audio domains.
    Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs. (arXiv:2203.16940v1 [eess.AS])
    In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10{\deg} even in scenarios with a reverberation time $T_{60}$ of 1.5 s.
    Differentially Private Federated Learning via Reconfigurable Intelligent Surface. (arXiv:2203.17028v1 [eess.SP])
    Federated learning (FL), as a disruptive machine learning paradigm, enables the collaborative training of a global model over decentralized local datasets without sharing them. It spans a wide scope of applications from Internet-of-Things (IoT) to biomedical engineering and drug discovery. To support low-latency and high-privacy FL over wireless networks, in this paper, we propose a reconfigurable intelligent surface (RIS) empowered over-the-air FL system to alleviate the dilemma between learning accuracy and privacy. This is achieved by simultaneously exploiting the channel propagation reconfigurability with RIS for boosting the receive signal power, as well as waveform superposition property with over-the-air computation (AirComp) for fast model aggregation. By considering a practical scenario where high-dimensional local model updates are transmitted across multiple communication blocks, we characterize the convergence behaviors of the differentially private federated optimization algorithm. We further formulate a system optimization problem to optimize the learning accuracy while satisfying privacy and power constraints via the joint design of transmit power, artificial noise, and phase shifts at RIS, for which a two-step alternating minimization framework is developed. Simulation results validate our systematic, theoretical, and algorithmic achievements and demonstrate that RIS can achieve a better trade-off between privacy and accuracy for over-the-air FL systems.
    Message Passing Neural Networks for Hypergraphs. (arXiv:2203.16995v1 [cs.LG])
    Hypergraph representations are both more efficient and better suited to describe data characterized by relations between two or more objects. In this work, we present the first graph neural network based on message passing capable of processing hypergraph-structured data. We show that the proposed model defines a design space for neural network models for hypergraphs, thus generalizing existing models for hypergraphs. We report experiments on a benchmark dataset for node classification, highlighting the effectiveness of the proposed model with respect to other state-of-the-art methods for graphs and hypergraphs. We also discuss the benefits of using hypergraph representations and, at the same time, highlight the limitation of using equivalent graph representations when the underlying problem has relations among more than two objects.
    Bangla hate speech detection on social media using attention-based recurrent neural network. (arXiv:2203.16775v1 [cs.CL])
    Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder decoder based machine learning model, a popular tool in NLP, to classify user's Bengali comments on Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU based decoders have been used for predicting hate speech categories. Among the three encoder decoder algorithms, the attention-based decoder obtained the best accuracy (77%).
    Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo. (arXiv:2203.17248v1 [cs.LG])
    Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most popular one is MoCo family. In essence, MoCo adopts a momentum-based queue dictionary, for which we perform a fine-grained analysis of its size and consistency. We point out that InfoNCE loss used in MoCo implicitly attract anchors to their corresponding positive sample with various strength of penalties and identify such inter-anchor hardness-awareness property as a major reason for the necessity of a large dictionary. Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. Based on an InfoNCE with the proposed dual temperature, our simplified frameworks, SimMoCo and SimCo, outperform MoCo v2 by a visible margin. Moreover, our work bridges the gap between CL and non-CL frameworks, contributing to a more unified understanding of these two mainstream frameworks in SSL. Code is available at: https://bit.ly/3LkQbaT.
    Intelligent Icing Detection Model of Wind Turbine Blades Based on SCADA data. (arXiv:2101.07914v1 [cs.LG] CROSS LISTED)
    Diagnosis of ice accretion on wind turbine blades is all the time a hard nut to crack in condition monitoring of wind farms. Existing methods focus on mechanism analysis of icing process, deviation degree analysis of feature engineering. However, there have not been deep researches of neural networks applied in this field at present. Supervisory control and data acquisition (SCADA) makes it possible to train networks through continuously providing not only operation parameters and performance parameters of wind turbines but also environmental parameters and operation modes. This paper explores the possibility that using convolutional neural networks (CNNs), generative adversarial networks (GANs) and domain adaption learning to establish intelligent diagnosis frameworks under different training scenarios. Specifically, PGANC and PGANT are proposed for sufficient and non-sufficient target wind turbine labeled data, respectively. The basic idea is that we consider a two-stage training with parallel GANs, which are aimed at capturing intrinsic features for normal and icing samples, followed by classification CNN or domain adaption module in various training cases. Model validation on three wind turbine SCADA data shows that two-stage training can effectively improve the model performance. Besides, if there is no sufficient labeled data for a target turbine, which is an extremely common phenomenon in real industrial practices, the addition of domain adaption learning makes the trained model show better performance. Overall, our proposed intelligent diagnosis frameworks can achieve more accurate detection on the same wind turbine and more generalized capability on a new wind turbine, compared with other machine learning models and conventional CNNs.
    A data-driven approach for the closure of RANS models by the divergence of the Reynolds Stress Tensor. (arXiv:2203.16944v1 [physics.flu-dyn])
    In the present paper a new data-driven model to close and increase accuracy of RANS equations is proposed. It is based on the direct approximation of the divergence of the Reynolds Stress Tensor (RST) through a Neural Network (NN). This choice is driven by the presence of the divergence of RST in the RANS equations. Furthermore, once this data-driven approach is trained, there is no need to run any turbulence model to close the equations. Finally, it is well known that a good approximation of a function it is not necessarily a good approximation of its derivative. The architecture and inputs choices of the proposed network guarantee both Galilean and coordinates-frame rotation invariances by looking to a vector basis expansion of the divergence of the RST. Two well-known test cases are used to show advantages of the proposed method compared to classic turbulence models.
    Graph Node-Feature Convolution for Representation Learning. (arXiv:1812.00086v2 [cs.LG] UPDATED)
    Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in the aggregation process, but features within a feature vector are still treated equally in the aggregation process. In this paper, we introduce a new convolution operation on regular size feature maps constructed from features of a fixed node bandwidth via sampling to get the first-level node representation, which is then passed to a standard GCN to learn the second-level node representation. Experiments show that our method outperforms competing methods in semi-supervised node classification tasks. Furthermore, our method opens new doors for exploring new GCN architectures, particularly deeper GCN models.
    Data-driven Set-based Estimation of Polynomial Systems with Application to SIR Epidemics. (arXiv:2111.04704v2 [eess.SY] UPDATED)
    This paper proposes a data-driven set-based estimation algorithm for a class of nonlinear systems with polynomial nonlinearities. Using the system's input-output data, the proposed method computes a set that guarantees the inclusion of the system's state in real-time. Although the system is assumed to be a polynomial type, the exact polynomial functions, and their coefficients are assumed to be unknown. To this end, the estimator relies on offline and online phases. The offline phase utilizes past input-output data to estimate a set of possible coefficients of the polynomial system. Then, using this estimated set of coefficients and the side information about the system, the online phase provides a set estimate of the state. Finally, the proposed methodology is evaluated through its application on SIR (Susceptible, Infected, Recovered) epidemic model.
    Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching. (arXiv:2201.11251v2 [cs.LG] UPDATED)
    Subgraph matching is a fundamental problem in various fields that use graph structured data. Subgraph matching algorithms enumerate all isomorphic embeddings of a query graph q in a data graph G. An important branch of matching algorithms exploit the backtracking search approach which recursively extends intermediate results following a matching order of query vertices. It has been shown that the matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In recent years, many advanced techniques for query vertex ordering (i.e., matching order generation) have been proposed to reduce the unpromising intermediate results according to the preset heuristic rules. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms. Instead of using the fixed heuristics to generate the matching order, our model could capture and make full use of the graph information, and thus determine the query vertex order with the adaptive learning-based rule that could significantly reduces the number of redundant enumerations. With the help of the reinforcement learning framework, our model is able to consider the long-term benefits rather than only consider the local information at current ordering step.Extensive experiments on six real-life data graphs demonstrate that our proposed matching order generation technique could reduce up to two orders of magnitude of query processing time compared to the state-of-the-art algorithms.
    Interpretation of Black Box NLP Models: A Survey. (arXiv:2203.17081v1 [cs.LG])
    An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
    Stability and Generalization Capabilities of Message Passing Graph Neural Networks. (arXiv:2202.00645v3 [cs.LG] UPDATED)
    Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization capabilities of MPNNs in graph classification. We assume that graphs of different classes are sampled from different random graph models. Based on this data distribution, we derive a non-asymptotic bound on the generalization gap between the empirical and statistical loss, that decreases to zero as the graphs become larger. This is proven by showing that a MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.
    BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. (arXiv:2110.05781v2 [eess.AS] UPDATED)
    Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities e.g., aircraft callsigns, command types, or values. One common challenge is Speech Activity Detection (SAD) and diarization system. If one of them fails then two or more single speaker segments remain in the same recording, jeopardizing the overall system's performance. We propose a system that combines the segmentation of a SAD module with a BERT model that performs speaker change detection (SCD) and speaker role detection (SRD) by chunking ASR transcripts i.e., diarization with a defined number of speakers together with SRD. The proposed model is evaluated on real-life ATC test sets. It reaches up to 0.90/0.95 F1-score on ATCO/pilot SRD, which means a 27% relative improvement on diarization error rate (DER) compared to standard acoustic-based diarization. Results are measured on ASR transcripts of challenging ATC test sets with $\sim$13\% word error rate, and the robustness of the system is even validated on noisy ASR transcripts.
    Sequence Transduction with Graph-based Supervision. (arXiv:2111.01272v2 [cs.CL] UPDATED)
    The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer achieves an improvement of 4.8% on the test-other condition of LibriSpeech relative to an equivalent RNN-T based system.
    D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation. (arXiv:2202.06484v3 [cs.CV] UPDATED)
    In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations. Active learning, maximizing model performance with few informative labeled data, comes in handy for such a scenario. In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. To adapt the model to the target domain with minimum queried labels, we propose acquiring labels of the samples with high probability density in the target domain yet with low probability density in the source domain, complementary to the existing source domain labeled data. To further facilitate labeling efficiency, we design a dynamic scheduling policy to adjust the labeling budgets between domain exploration and model uncertainty over time. Extensive experiments show that our method outperforms existing active learning and domain adaptation baselines on two benchmarks, GTA5 -> Cityscapes and SYNTHIA -> Cityscapes. With less than 5% target domain annotations, our method reaches comparable results with that of full supervision.
    Bayesian optimization with known experimental and design constraints for chemistry applications. (arXiv:2203.17241v1 [math.OC])
    Optimization strategies driven by machine learning, such as Bayesian optimization, are being explored across experimental sciences as an efficient alternative to traditional design of experiment. When combined with automated laboratory hardware and high-performance computing, these strategies enable next-generation platforms for autonomous experimentation. However, the practical application of these approaches is hampered by a lack of flexible software and algorithms tailored to the unique requirements of chemical research. One such aspect is the pervasive presence of constraints in the experimental conditions when optimizing chemical processes or protocols, and in the chemical space that is accessible when designing functional molecules or materials. Although many of these constraints are known a priori, they can be interdependent, non-linear, and result in non-compact optimization domains. In this work, we extend our experiment planning algorithms Phoenics and Gryffin such that they can handle arbitrary known constraints via an intuitive and flexible interface. We benchmark these extended algorithms on continuous and discrete test functions with a diverse set of constraints, demonstrating their flexibility and robustness. In addition, we illustrate their practical utility in two simulated chemical research scenarios: the optimization of the synthesis of o-xylenyl Buckminsterfullerene adducts under constrained flow conditions, and the design of redox active molecules for flow batteries under synthetic accessibility constraints. The tools developed constitute a simple, yet versatile strategy to enable model-based optimization with known experimental constraints, contributing to its applicability as a core component of autonomous platforms for scientific discovery.
    GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems. (arXiv:2201.09562v3 [cs.LG] UPDATED)
    Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe.
    BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. (arXiv:2203.15536v2 [cs.CV] UPDATED)
    Our goal is to recover the 3D shape and pose of dogs from a single image. This is a challenging task because dogs exhibit a wide range of shapes and appearances, and are highly articulated. Recent work has proposed to directly regress the SMAL animal model, with additional limb scale parameters, from images. Our method, called BARC (Breed-Augmented Regression using Classification), goes beyond prior work in several important ways. First, we modify the SMAL shape space to be more appropriate for representing dog shape. But, even with a better shape model, the problem of regressing dog shape from an image is still challenging because we lack paired images with 3D ground truth. To compensate for the lack of paired data, we formulate novel losses that exploit information about dog breeds. In particular, we exploit the fact that dogs of the same breed have similar body shapes. We formulate a novel breed similarity loss consisting of two parts: One term encourages the shape of dogs from the same breed to be more similar than dogs of different breeds. The second one, a breed classification loss, helps to produce recognizable breed-specific shapes. Through ablation studies, we find that our breed losses significantly improve shape accuracy over a baseline without them. We also compare BARC qualitatively to WLDO with a perceptual study and find that our approach produces dogs that are significantly more realistic. This work shows that a-priori information about genetic similarity can help to compensate for the lack of 3D training data. This concept may be applicable to other animal species or groups of species. Our code is publicly available for research purposes at https://barc.is.tue.mpg.de/.
    SGTR: End-to-end Scene Graph Generation with Transformer. (arXiv:2112.12970v3 [cs.CV] UPDATED)
    Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property. Most previous works adopt a bottom-up two-stage or a point-based one-stage approach, which often suffers from high time complexity or sub-optimal designs. In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. To solve the problem, we develop a transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets. In particular, we develop a new entity-aware predicate representation based on a structural predicate generator that leverages the compositional property of relationships. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner. Extensive experimental results show that our design is able to achieve the state-of-the-art or comparable performance on two challenging benchmarks, surpassing most of the existing approaches and enjoying higher efficiency in inference. We hope our model can serve as a strong baseline for the Transformer-based scene graph generation. Code is available: https://github.com/Scarecrow0/SGTR
    GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design. (arXiv:2112.11594v2 [cs.AR] UPDATED)
    Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the node degree of GCNs tends to follow the power-law distribution and therefore have highly irregular adjacency matrices, resulting in prohibitive inefficiencies in both data processing and movement and thus substantially limiting the achievable GCN acceleration efficiency. To this end, this paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity and boost GCNs' inference efficiency. Specifically, on the algorithm level, GCoD integrates a split and conquer GCN training strategy that polarizes the graphs to be either denser or sparser in local neighborhoods without compromising the model accuracy, resulting in graph adjacency matrices that (mostly) have merely two levels of workload and enjoys largely enhanced regularity and thus ease of acceleration. On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization and acceleration efficiency. Extensive experiments and ablation studies validate that our GCoD consistently reduces the number of off-chip accesses, leading to speedups of 15286x, 294x, 7.8x, and 2.5x as compared to CPUs, GPUs, and prior-art GCN accelerators including HyGCN and AWB-GCN, respectively, while maintaining or even improving the task accuracy. Codes are available at https://github.com/RICE-EIC/GCoD.
    R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis. (arXiv:2203.17261v1 [cs.CV])
    Recent research explosion on Neural Radiance Field (NeRF) shows the encouraging potential to represent complex scenes with neural networks. One major drawback of NeRF is its prohibitive inference time: Rendering a single pixel requires querying the NeRF network hundreds of times. To resolve it, existing efforts mainly attempt to reduce the number of required sampled points. However, the problem of iterative sampling still exists. On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching. In this work, we present a deep residual MLP network (88 layers) to effectively learn the light field. We show the key to successfully learning such a deep NeLF network is to have sufficient data, for which we transfer the knowledge from a pre-trained NeRF model via data distillation. Extensive experiments on both synthetic and real-world scenes show the merits of our method over other counterpart algorithms. On the synthetic scenes, we achieve 26-35x FLOPs reduction (per camera ray) and 28-31x runtime speedup, meanwhile delivering significantly better (1.4-2.8 dB average PSNR improvement) rendering quality than NeRF without any customized implementation tricks.
    FBDNN: Filter Banks and Deep Neural Networks for Portable and Fast Brain-Computer Interfaces. (arXiv:2109.02165v4 [eess.SP] UPDATED)
    Objective: To propose novel SSVEP classification methodologies using deep neural networks (DNNs) and improve performances in single-channel and user-independent brain-computer interfaces (BCIs) with small data lengths. Approach: We propose the utilization of filter banks (creating sub-band components of the EEG signal) in conjunction with DNNs. In this context, we created three different models: a recurrent neural network (FBRNN) analyzing the time domain, a 2D convolutional neural network (FBCNN-2D) processing complex spectrum features and a 3D convolutional neural network (FBCNN-3D) analyzing complex spectrograms, which we introduce in this study as possible input for SSVEP classification. We tested our neural networks on three open datasets and conceived them so as not to require calibration from the final user, simulating a user-independent BCI. Results: The DNNs with the filter banks surpassed the accuracy of similar networks without this preprocessing step by considerable margins, and they outperformed common SSVEP classification methods (SVM and FBCCA) by even higher margins. Conclusion and significance: Filter banks allow different types of deep neural networks to more efficiently analyze the harmonic components of SSVEP. Complex spectrograms carry more information than complex spectrum features and the magnitude spectrum, allowing the FBCNN-3D to surpass the other CNNs. The performances obtained in the challenging classification problems indicates a strong potential for the construction of portable, economical, fast and low-latency BCIs.
    Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning. (arXiv:2111.14213v2 [cs.LG] UPDATED)
    Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. Code is available at https://github.com/mmendiet/FedAlign
    MBORE: Multi-objective Bayesian Optimisation by Density-Ratio Estimation. (arXiv:2203.16912v1 [cs.LG])
    Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
    A Temporal-oriented Broadcast ResNet for COVID-19 Detection. (arXiv:2203.17012v1 [cs.SD])
    Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size. Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet~(TorNet), constitutes of a broadcasting learning block, i.e. the Alternating Broadcast (AB) Block, which contains several Broadcast Residual Blocks (BC ResBlocks) and a convolution layer. With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks~(RNNs), typically used to model temporal information. TorNet achieves 72.2% Unweighted Average Recall (UAR) on the INTERPSEECH 2021 Computational Paralinguistics Challenge COVID-19 cough Sub-Challenge, by this showing competitive results with a higher computational efficiency than other state-of-the-art alternatives.
    Online Learning for Traffic Routing under Unknown Preferences. (arXiv:2203.17150v1 [cs.LG])
    In transportation networks, users typically choose routes in a decentralized and self-interested manner to minimize their individual travel costs, which, in practice, often results in inefficient overall outcomes for society. As a result, there has been a growing interest in designing road tolling schemes to cope with these efficiency losses and steer users toward a system-efficient traffic pattern. However, the efficacy of road tolling schemes often relies on having access to complete information on users' trip attributes, such as their origin-destination (O-D) travel information and their values of time, which may not be available in practice. Motivated by this practical consideration, we propose an online learning approach to set tolls in a traffic network to drive heterogeneous users with different values of time toward a system-efficient traffic pattern. In particular, we develop a simple yet effective algorithm that adjusts tolls at each time period solely based on the observed aggregate flows on the roads of the network without relying on any additional trip attributes of users, thereby preserving user privacy. In the setting where the O-D pairs and values of time of users are drawn i.i.d. at each period, we show that our approach obtains an expected regret and road capacity violation of $O(\sqrt{T})$, where $T$ is the number of periods over which tolls are updated. Our regret guarantee is relative to an offline oracle that has complete information on users' trip attributes. We further establish a $\Omega(\sqrt{T})$ lower bound on the regret of any algorithm, which establishes that our algorithm is optimal up to constants. Finally, we demonstrate the superior performance of our approach relative to several benchmarks on a real-world transportation network, thereby highlighting its practical applicability.
    Generative Flows with Invertible Attentions. (arXiv:2106.03959v4 [cs.LG] UPDATED)
    Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while it has made breakthroughs in other domains. To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transformer-based attentions, for both unconditional and conditional generative flows. The key idea is to exploit a masked scheme of these two attentions to learn long-range data dependencies in the context of generative flows. The masked scheme allows for invertible attention modules with tractable Jacobian determinants, enabling its seamless integration at any positions of the flow-based models. The proposed attention mechanisms lead to more efficient generative flows, due to their capability of modeling the long-term data dependencies. Evaluation on multiple image synthesis tasks shows that the proposed attention flows result in efficient models and compare favorably against the state-of-the-art unconditional and conditional generative flows.
    Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks. (arXiv:2203.17030v1 [cs.CV])
    New classes arise frequently in our ever-changing world, e.g., emerging topics in social media and new types of products in e-commerce. A model should recognize new classes and meanwhile maintain discriminability over old classes. Under severe circumstances, only limited novel instances are available to incrementally update the model. The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL). In this work, we propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT), which synthesizes fake FSCIL tasks from the base dataset. The data format of fake tasks is consistent with the `real' incremental tasks, and we can build a generalizable feature space for the unseen tasks through meta-learning. Besides, LIMIT also constructs a calibration module based on transformer, which calibrates the old class classifiers and new class prototypes into the same scale and fills in the semantic gap. The calibration module also adaptively contextualizes the instance-specific embedding with a set-to-set function. LIMIT efficiently adapts to new classes and meanwhile resists forgetting over old classes. Experiments on three benchmark datasets (CIFAR100, miniImageNet, and CUB200) and large-scale dataset, i.e., ImageNet ILSVRC2012 validate that LIMIT achieves state-of-the-art performance.
    Predicting Winners of the Reality TV Dating Show $\textit{The Bachelor}$ Using Machine Learning Algorithms. (arXiv:2203.16648v1 [cs.LG])
    $\textit{The Bachelor}$ is a reality TV dating show in which a single bachelor selects his wife from a pool of approximately 30 female contestants over eight weeks of filming (American Broadcasting Company 2002). We collected the following data on all 422 contestants that participated in seasons 11 through 25: their Age, Hometown, Career, Race, Week they got their first 1-on-1 date, whether they got the first impression rose, and what "place" they ended up getting. We then trained three machine learning models to predict the ideal characteristics of a successful contestant on $\textit{The Bachelor}$. The three algorithms that we tested were: random forest classification, neural networks, and linear regression. We found consistency across all three models, although the neural network performed the best overall. Our models found that a woman has the highest probability of progressing far on $\textit{The Bachelor}$ if she is: 26 years old, white, from the Northwest, works as an dancer, received a 1-on-1 in week 6, and did not receive the First Impression Rose. Our methodology is broadly applicable to all romantic reality television, and our results will inform future $\textit{The Bachelor}$ production and contestant strategies. While our models were relatively successful, we still encountered high misclassification rates. This may be because: (1) Our training dataset had fewer than 400 points or (2) Our models were too simple to parameterize the complex romantic connections contestants forge over the course of a season.
    When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning. (arXiv:2203.16797v1 [cs.LG])
    Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of results. In this paper, we survey an abundant number of recent works in PIML and summarize them from three aspects: (1) motivations of PIML, (2) physics knowledge in PIML, (3) methods of physics knowledge integration in PIML. We also discuss current challenges and corresponding research opportunities in PIML.
    A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory. (arXiv:2203.17226v1 [math.OC])
    Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.
    Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load. (arXiv:2203.16637v1 [cs.SD])
    As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of stress-inducing task load. Historically, voice stress analysis (VSA) has been conducted using conventional digital signal processing (DSP) techniques. Despite the development of modern methods based on deep neural networks (DNNs), accurately detecting stress in speech remains difficult due to the wide variety of stressors and considerable variability in the individual stress perception. To that end, we introduce a set of five datasets for task load detection in speech. The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers, with a cumulative number of more than a hundred speakers. We used the datasets to design and evaluate a novel self-supervised audio representation that leverages the effectiveness of handcrafted features (DSP-based) and the complexity of data-driven DNN representations. Notably, the proposed approach outperformed both extensive handcrafted feature sets and novel DNN-based audio representation learning approaches.
    An unsupervised cluster-level based method for learning node representations of heterogeneous graphs in scientific papers. (arXiv:2203.16751v1 [cs.LG])
    Learning knowledge representation of scientific paper data is a problem to be solved, and how to learn the representation of paper nodes in scientific paper heterogeneous network is the core to solve this problem. This paper proposes an unsupervised cluster-level scientific paper heterogeneous graph node representation learning method (UCHL), aiming at obtaining the representation of nodes (authors, institutions, papers, etc.) in the heterogeneous graph of scientific papers. Based on the heterogeneous graph representation, this paper performs link prediction on the entire heterogeneous graph and obtains the relationship between the edges of the nodes, that is, the relationship between papers and papers. Experiments results show that the proposed method achieves excellent performance on multiple evaluation metrics on real scientific paper datasets.
    Neural Architecture Search for Speech Emotion Recognition. (arXiv:2203.16928v1 [cs.SD])
    Deep neural networks have brought significant advancements to speech emotion recognition (SER). However, the architecture design in SER is mainly based on expert knowledge and empirical (trial-and-error) evaluations, which is time-consuming and resource intensive. In this paper, we propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. To accelerate the candidate architecture optimization, we propose a uniform path dropout strategy to encourage all candidate architecture operations to be equally optimized. Experimental results of two different neural structures on IEMOCAP show that NAS can improve SER performance (54.89\% to 56.28\%) while maintaining model parameter sizes. The proposed dropout strategy also shows superiority over the previous approaches.
    Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification. (arXiv:2203.16988v1 [cs.SD])
    Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these methods with complex network structures are hard to be recognized by hardware. In this paper, a novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals. The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed, which may generalize well to real data. The code and trained models are available at https://github.com/JoaquinChou/Acoustic-Net.
    A survey of neural models for the automatic analysis of conversation: Towards a better integration of the social sciences. (arXiv:2203.16891v1 [cs.CL])
    Some exciting new approaches to neural architectures for the analysis of conversation have been introduced over the past couple of years. These include neural architectures for detecting emotion, dialogue acts, and sentiment polarity. They take advantage of some of the key attributes of contemporary machine learning, such as recurrent neural networks with attention mechanisms and transformer-based approaches. However, while the architectures themselves are extremely promising, the phenomena they have been applied to to date are but a small part of what makes conversation engaging. In this paper we survey these neural architectures and what they have been applied to. On the basis of the social science literature, we then describe what we believe to be the most fundamental and definitional feature of conversation, which is its co-construction over time by two or more interlocutors. We discuss how neural architectures of the sort surveyed could profitably be applied to these more fundamental aspects of conversation, and what this buys us in terms of a better analysis of conversation and even, in the longer term, a better way of generating conversation for a conversational system.
    Preventing Over-Smoothing for Hypergraph Neural Networks. (arXiv:2203.17159v1 [cs.LG])
    In recent years, hypergraph learning has attracted great attention due to its capacity in representing complex and high-order relationships. However, current neural network approaches designed for hypergraphs are mostly shallow, thus limiting their ability to extract information from high-order neighbors. In this paper, we show both theoretically and empirically, that the performance of hypergraph neural networks does not improve as the number of layers increases, which is known as the over-smoothing problem. To tackle this issue, we develop a new deep hypergraph convolutional network called Deep-HGCN, which can maintain the heterogeneity of node representation in deep layers. Specifically, we prove that a $k$-layer Deep-HGCN simulates a polynomial filter of order $k$ with arbitrary coefficients, which can relieve the problem of over-smoothing. Experimental results on various datasets demonstrate the superior performance of the proposed model comparing to the state-of-the-art hypergraph learning approaches.
    Learning from few examples with nonlinear feature maps. (arXiv:2203.16935v1 [cs.LG])
    In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations.
    Ransomware Detection using Process Memory. (arXiv:2203.16871v1 [cs.CR])
    Ransomware attacks have increased significantly in recent years, causing great destruction and damage to critical systems and business operations. Attackers are unfailingly finding innovative ways to bypass detection mechanisms, whichencouraged the adoption of artificial intelligence. However, most research summarizes the general features of AI and induces many false positives, as the behavior of ransomware constantly differs to bypass detection. Focusing on the key indicating features of ransomware becomes vital as this guides the investigator to the inner workings and main function of ransomware itself. By utilizing access privileges in process memory, the main function of the ransomware can be detected more easily and accurately. Furthermore, new signatures and fingerprints of ransomware families can be identified to classify novel ransomware attacks correctly. The current research used the process memory access privileges of the different memory regions of the behavior of an executable to quickly determine its intent before serious harm can occur. To achieve this aim, several well-known machine learning algorithms were explored with an accuracy range of 81.38 to 96.28 percents. The study thus confirms the feasibility of utilizing process memory as a detection mechanism for ransomware.
    An Optimal Control Method to Compute the Most Likely Transition Path for Stochastic Dynamical Systems with Jumps. (arXiv:2203.16874v1 [math.NA])
    Many complex real world phenomena exhibit abrupt, intermittent or jumping behaviors, which are more suitable to be described by stochastic differential equations under non-Gaussian L\'evy noise. Among these complex phenomena, the most likely transition paths between metastable states are important since these rare events may have high impact in certain scenarios. Based on the large deviation principle, the most likely transition path could be treated as the minimizer of the rate function upon paths that connect two points. One of the challenges to calculate the most likely transition path for stochastic dynamical systems under non-Gaussian L\'evy noise is that the associated rate function can not be explicitly expressed by paths. For this reason, we formulate an optimal control problem to obtain the optimal state as the most likely transition path. We then develop a neural network method to solve this issue. Several experiments are investigated for both Gaussian and non-Gaussian cases.
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v1 [quant-ph])
    Parametrized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.
    JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech. (arXiv:2203.16852v1 [eess.AS])
    In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline is somewhat cumbersome in that it requires a fine-tuning and an accurate speech-text alignment for optimal performance. In this work, we present end-to-end text-to-speech (E2E-TTS) model which has a simplified training pipeline and outperforms a cascade of separately learned models. Specifically, our proposed model is jointly trained FastSpeech2 and HiFi-GAN with an alignment module. Since there is no acoustic feature mismatch between training and inference, it does not requires fine-tuning. Furthermore, we remove dependency on an external speech-text alignment tool by adopting an alignment learning objective in our joint training framework. Experiments on LJSpeech corpus shows that the proposed model outperforms publicly available, state-of-the-art implementations of ESPNet2-TTS on subjective evaluation (MOS) and some objective evaluations.
    Data-driven Prediction of Relevant Scenarios for Robust Optimization. (arXiv:2203.16642v1 [math.OC])
    In this work we study robust one- and two-stage problems with discrete uncertainty sets which are known to be hard to solve even if the underlying deterministic problem is easy. Popular solution methods iteratively generate scenario constraints and possibly second-stage variables. This way, by solving a sequence of smaller problems, it is often possible to avoid the complexity of considering all scenarios simultaneously. A key ingredient for the performance of the iterative methods is a good selection of start scenarios. In this paper we propose a data-driven heuristic to seed the iterative solution method with a set of starting scenarios that provide a strong lower bound early in the process, and result in considerably smaller overall solution times compared to other benchmark methods. Our heuristic learns the relevance of a scenario by extracting information from training data based on a combined similarity measure between robust problem instances and single scenarios. Our experiments show that predicting even a small number of good start scenarios by our method can considerably reduce the computation time of the iterative methods.
    Recovering models of open quantum systems from data via polynomial optimization: Towards globally convergent quantum system identification. (arXiv:2203.17164v1 [quant-ph])
    Current quantum devices suffer imperfections as a result of fabrication, as well as noise and dissipation as a result of coupling to their immediate environments. Because of this, it is often difficult to obtain accurate models of their dynamics from first principles. An alternative is to extract such models from time-series measurements of their behavior. Here, we formulate this system-identification problem as a polynomial optimization problem. Recent advances in optimization have provided globally convergent solvers for this class of problems, which using our formulation prove estimates of the Kraus map or the Lindblad equation. We include an overview of the state-of-the-art algorithms, bounds, and convergence rates, and illustrate the use of this approach to modeling open quantum systems.
    SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. (arXiv:2203.16749v1 [eess.AS])
    Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.
    Distributional Robust Batch Contextual Bandits. (arXiv:2006.05630v4 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    Calibrating constitutive models with full-field data via physics informed neural networks. (arXiv:2203.16577v1 [cs.LG])
    The calibration of solid constitutive models with full-field experimental data is a long-standing challenge, especially in materials which undergo large deformation. In this paper, we propose a physics-informed deep-learning framework for the discovery of constitutive model parameterizations given full-field displacement data and global force-displacement data. Contrary to the majority of recent literature in this field, we work with the weak form of the governing equations rather than the strong form to impose physical constraints upon the neural network predictions. The approach presented in this paper is computationally efficient, suitable for irregular geometric domains, and readily ingests displacement data without the need for interpolation onto a computational grid. A selection of canonical hyperelastic materials models suitable for different material classes is considered including the Neo-Hookean, Gent, and Blatz-Ko constitutive models as exemplars for general hyperelastic behavior, polymer behavior with lock-up, and compressible foam behavior respectively. We demonstrate that physics informed machine learning is an enabling technology and may shift the paradigm of how full-field experimental data is utilized to calibrate constitutive models under finite deformations.  ( 2 min )
    Mind the gap: Challenges of deep learning approaches to Theory of Mind. (arXiv:2203.16540v1 [cs.LG])
    Theory of Mind is an essential ability of humans to infer the mental states of others. Here we provide a coherent summary of the potential, current progress, and problems of deep learning approaches to Theory of Mind. We highlight that many current findings can be explained through shortcuts. These shortcuts arise because the tasks used to investigate Theory of Mind in deep learning systems have been too narrow. Thus, we encourage researchers to investigate Theory of Mind in complex open-ended environments. Furthermore, to inspire future deep learning systems we provide a concise overview of prior work done in humans. We further argue that when studying Theory of Mind with deep learning, the research's main focus and contribution ought to be opening up the network's representations. We recommend researchers use tools from the field of interpretability of AI to study the relationship between different network components and aspects of Theory of Mind.  ( 2 min )
    A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization. (arXiv:2203.16615v1 [cs.LG])
    Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a {\L}ojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.  ( 2 min )
    Towards Differential Relational Privacy and its use in Question Answering. (arXiv:2203.16701v1 [cs.LG])
    Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference is most pronounced when the data distribution is long-tailed, with many queries having only few training examples: Impeding general memorization prevents effective learning, while impeding only relational memorization still allows learning general properties of the underlying concepts. We formalize the notion of Relational Privacy (RP) and, inspired by Differential Privacy (DP), we provide a possible definition of Differential Relational Privacy (DrP). These notions can be used to describe and compute bounds on the amount of RM in a trained model. We illustrate Relational Privacy concepts in experiments with large-scale models for Question Answering.  ( 2 min )
    Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle. (arXiv:2203.16668v1 [cs.LG])
    Many popular contextual bandit algorithms estimate reward models to inform decision making. However, true rewards can contain action-independent redundancies that are not relevant for decision making and only increase the statistical complexity of accurate estimation. It is sufficient and more data-efficient to estimate the simplest function that explains the reward differences between actions, that is, the heterogeneous treatment effect, commonly understood to be more structured and simpler than the reward. Motivated by this observation, building on recent work on oracle-based algorithms, we design a statistically optimal and computationally efficient algorithm using heterogeneous treatment effect estimation oracles. Our results provide the first universal reduction of contextual bandits to a general-purpose heterogeneous treatment effect estimation method. We show that our approach is more robust to model misspecification than reward estimation methods based on squared error regression oracles. Experimentally, we show the benefits of heterogeneous treatment effect estimation in contextual bandits over reward estimation.  ( 2 min )
    Monte Carlo Tree Search based Hybrid Optimization of Variational Quantum Circuits. (arXiv:2203.16707v1 [quant-ph])
    Variational quantum algorithms stand at the forefront of simulations on near-term and future fault-tolerant quantum devices. While most variational quantum algorithms involve only continuous optimization variables, the representational power of the variational ansatz can sometimes be significantly enhanced by adding certain discrete optimization variables, as is exemplified by the generalized quantum approximate optimization algorithm (QAOA). However, the hybrid discrete-continuous optimization problem in the generalized QAOA poses a challenge to the optimization. We propose a new algorithm called MCTS-QAOA, which combines a Monte Carlo tree search method with an improved natural policy gradient solver to optimize the discrete and continuous variables in the quantum circuit, respectively. We find that MCTS-QAOA has excellent noise-resilience properties and outperforms prior algorithms in challenging instances of the generalized QAOA.  ( 2 min )
    Recent improvements of ASR models in the face of adversarial attacks. (arXiv:2203.16536v1 [cs.CR])
    Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks. However recent research has pointed out differences between attacks and defenses on ASR models compared to image models. Improving the robustness of ASR models requires a paradigm shift from evaluating attacks on one or a few models to a systemic approach in evaluation. We lay the ground for such research by evaluating on various architectures a representative set of adversarial attacks: targeted and untargeted, optimization and speech processing-based, white-box, black-box and targeted attacks. Our results show that the relative strengths of different attack algorithms vary considerably when changing the model architecture, and that the results of some attacks are not to be blindly trusted. They also indicate that training choices such as self-supervised pretraining can significantly impact robustness by enabling transferable perturbations. We release our source code as a package that should help future research in evaluating their attacks and defenses.  ( 2 min )
    Parallel framework for Dynamic Domain Decomposition of Data Assimilation problems a case study on Kalman Filter algorithm. (arXiv:2203.16535v1 [cs.LG])
    We focus on Partial Differential Equation (PDE) based Data Assimilatio problems (DA) solved by means of variational approaches and Kalman filter algorithm. Recently, we presented a Domain Decomposition framework (we call it DD-DA, for short) performing a decomposition of the whole physical domain along space and time directions, and joining the idea of Schwarz' methods and parallel in time approaches. For effective parallelization of DD-DA algorithms, the computational load assigned to subdomains must be equally distributed. Usually computational cost is proportional to the amount of data entities assigned to partitions. Good quality partitioning also requires the volume of communication during calculation to be kept at its minimum. In order to deal with DD-DA problems where the observations are nonuniformly distributed and general sparse, in the present work we employ a parallel load balancing algorithm based on adaptive and dynamic defining of boundaries of DD -- which is aimed to balance workload according to data location. We call it DyDD. As the numerical model underlying DA problems arising from the so-called discretize-then-optimize approach is the constrained least square model (CLS), we will use CLS as a reference state estimation problem and we validate DyDD on different scenarios.  ( 2 min )
    Active Learning for Computationally Efficient Distribution of Binary Evolution Simulations. (arXiv:2203.16683v1 [astro-ph.SR])
    Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies.  ( 2 min )
    Physics-constrained Unsupervised Learning of Partial Differential Equations using Meshes. (arXiv:2203.16628v1 [cs.LG])
    Enhancing neural networks with knowledge of physical equations has become an efficient way of solving various physics problems, from fluid flow to electromagnetism. Graph neural networks show promise in accurately representing irregularly meshed objects and learning their dynamics, but have so far required supervision through large datasets. In this work, we represent meshes naturally as graphs, process these using Graph Networks, and formulate our physics-based loss to provide an unsupervised learning framework for partial differential equations (PDE). We quantitatively compare our results to a classical numerical PDE solver, and show that our computationally efficient approach can be used as an interactive PDE solver that is adjusting boundary conditions in real-time and remains sufficiently close to the baseline solution. Our inherently differentiable framework will enable the application of PDE solvers in interactive settings, such as model-based control of soft-body deformations, or in gradient-based optimization methods that require a fully differentiable pipeline.  ( 2 min )
    Federated Learning for the Classification of Tumor Infiltrating Lymphocytes. (arXiv:2203.16622v1 [eess.IV])
    We evaluate the performance of federated learning (FL) in developing deep learning models for analysis of digitized tissue sections. A classification application was considered as the example use case, on quantifiying the distribution of tumor infiltrating lymphocytes within whole slide images (WSIs). A deep learning classification model was trained using 50*50 square micron patches extracted from the WSIs. We simulated a FL environment in which a dataset, generated from WSIs of cancer from numerous anatomical sites available by The Cancer Genome Atlas repository, is partitioned in 8 different nodes. Our results show that the model trained with the federated training approach achieves similar performance, both quantitatively and qualitatively, to that of a model trained with all the training data pooled at a centralized location. Our study shows that FL has tremendous potential for enabling development of more robust and accurate models for histopathology image analysis without having to collect large and diverse training data at a single location.  ( 2 min )
    Efficient Localness Transformer for Smart Sensor-Based Energy Disaggregation. (arXiv:2203.16537v1 [cs.LG])
    Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technology, energy disaggregation has great potential to increase electricity efficiency and reduce energy expenditure. With the introduction of transformer models, NILM has achieved significant improvements in predicting device power readings. Nevertheless, transformers are less efficient due to O(l^2) complexity w.r.t. sequence length l. Moreover, transformers can fail to capture local signal patterns in sequence-to-point settings due to the lack of inductive bias in local context. In this work, we propose an efficient localness transformer for non-intrusive load monitoring (ELTransformer). Specifically, we leverage normalization functions and switch the order of matrix multiplication to approximate self-attention and reduce computational complexity. Additionally, we introduce localness modeling with sparse local attention heads and relative position encodings to enhance the model capacity in extracting short-term local patterns. To the best of our knowledge, ELTransformer is the first NILM model that addresses computational complexity and localness modeling in NILM. With extensive experiments and quantitative analyses, we demonstrate the efficiency and effectiveness of the the proposed ELTransformer with considerable improvements compared to state-of-the-art baselines.  ( 2 min )
    Generation of Speaker Representations Using Heterogeneous Training Batch Assembly. (arXiv:2203.16646v1 [cs.SD])
    In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session. To be more consistent with the back-end segmentation and clustering, we propose a new CNN-based speaker modeling scheme, which takes into account the heterogeneity of the speakers in each training segment and batch. We randomly and synthetically augment the training data into a set of segments, each of which contains more than one speaker and some overlapping parts. A soft label is imposed on each segment based on its speaker occupation ratio, and the standard cross entropy loss is implemented in model training. In this way, the speaker model should have the ability to generate a geometrically meaningful embedding for each multi-speaker segment. Experimental results show that our system is superior to the baseline system using x-vectors in two speaker diarization tasks. In the CALLHOME task trained on the NIST SRE and Switchboard datasets, our system achieves a relative reduction of 12.93% in DER. In Track 2 of CHiME-6, our system provides 13.24%, 12.60%, and 5.65% relative reductions in DER, JER, and WER, respectively.  ( 2 min )
    Graph Refinement for Coreference Resolution. (arXiv:2203.16574v1 [cs.CL])
    The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.  ( 2 min )
    Challenges in leveraging GANs for few-shot data augmentation. (arXiv:2203.16662v1 [stat.ML])
    In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.  ( 2 min )
    Constrained Few-shot Class-incremental Learning. (arXiv:2203.16588v1 [cs.CV])
    Continually learning new classes from fresh data without forgetting previous knowledge of old classes is a very challenging research problem. Moreover, it is imperative that such learning must respect certain memory and computational constraints such as (i) training samples are limited to only a few per class, (ii) the computational cost of learning a novel class remains constant, and (iii) the memory footprint of the model grows at most linearly with the number of classes observed. To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. C-FSCIL provides three update modes that offer a trade-off between accuracy and compute-memory cost of learning novel classes. C-FSCIL exploits hyperdimensional embedding that allows to continually express many more classes than the fixed dimensions in the vector space, with minimal interference. The quality of class vector representations is further improved by aligning them quasi-orthogonally to each other by means of novel loss functions. Experiments on the CIFAR100, miniImageNet, and Omniglot datasets show that C-FSCIL outperforms the baselines with remarkable accuracy and compression. It also scales up to the largest problem size ever tried in this few-shot setting by learning 423 novel classes on top of 1200 base classes with less than 1.6% accuracy drop. Our code is available at https://github.com/IBM/constrained-FSCIL.  ( 2 min )
    Identification of diffracted vortex beams at different propagation distances using deep learning. (arXiv:2203.16539v1 [cs.LG])
    Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this article, we exploit an enhanced deep learning neural network to identify different OAM modes of light at multiple propagation distances with phase distortions. Specifically, our trained deep learning neural network can efficiently identify the vortex beam's topological charge and propagation distance with 97% accuracy. Our technique has important implications for OAM based communication and sensing protocols.  ( 2 min )
    FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations. (arXiv:2203.16639v1 [cs.CV])
    We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. Our model, namely FALCON, represents individual visual concepts, such as colors and shapes, as axis-aligned boxes in a high-dimensional space (the "box embedding space"). Given an input image and its paired sentence, our model first resolves the referential expression in the sentence and associates the novel concept with particular objects in the scene. Next, our model interprets supplemental sentences to relate the novel concept with other known concepts, such as "X has property Y" or "X is a kind of Y". Finally, it infers an optimal box embedding for the novel concept that jointly 1) maximizes the likelihood of the observed instances in the image, and 2) satisfies the relationships between the novel concepts and the known ones. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.  ( 2 min )
    Transformer Language Models without Positional Encodings Still Learn Positional Information. (arXiv:2203.16634v1 [cs.CL])
    Transformers typically require some form of positional encoding, such as positional embeddings, to process natural language sequences. Surprisingly, we find that transformer language models without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position.  ( 2 min )
    Machine Learning Approaches for Non-Intrusive Home Absence Detection Based on Appliance Electrical Use. (arXiv:2203.16538v1 [cs.LG])
    Home absence detection is an emerging field on smart home installations. Identifying whether or not the residents of the house are present, is important in numerous scenarios. Possible scenarios include but are not limited to: elderly people living alone, people suffering from dementia, home quarantine. The majority of published papers focus on either pressure / door sensors or cameras in order to detect outing events. Although the aforementioned approaches provide solid results, they are intrusive and require modifications for sensor placement. In our work, appliance electrical use is investigated as a means for detecting the presence or absence of residents. The energy use is the result of power disaggregation, a non intrusive / non invasive sensing method. Since a dataset providing energy data and ground truth for home absence is not available, artificial outing events were introduced on the UK-DALE dataset, a well known dataset for Non Intrusive Load Monitoring (NILM). Several machine learning algorithms were evaluated using the generated dataset. Benchmark results have shown that home absence detection using appliance power consumption is feasible.  ( 2 min )
  • Open

    A Single-Timescale Method for Stochastic Bilevel Optimization. (arXiv:2102.04671v4 [math.OC] UPDATED)
    Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
    Flat-topped Probability Density Functions for Mixture Models. (arXiv:2203.17027v1 [cs.LG])
    This paper investigates probability density functions (PDFs) that are continuous everywhere, nearly uniform around the mode of distribution, and adaptable to a variety of distribution shapes ranging from bell-shaped to rectangular. From the viewpoint of computational tractability, the PDF based on the Fermi-Dirac or logistic function is advantageous in estimating its shape parameters. The most appropriate PDF for $n$-variate distribution is of the form: $p\left(\mathbf{x}\right)\propto\left[\cosh\left(\left[\left(\mathbf{x}-\mathbf{m}\right)^{\mathsf{T}}\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\mathbf{m}\right)\right]^{n/2}\right)+\cosh\left(r^{n}\right)\right]^{-1}$ where $\mathbf{x},\mathbf{m}\in\mathbb{R}^{n}$, $\boldsymbol{\Sigma}$ is an $n\times n$ positive definite matrix, and $r>0$ is a shape parameter. The flat-topped PDFs can be used as a component of mixture models in machine learning to improve goodness of fit and make a model as simple as possible.
    Equivariant Diffusion for Molecule Generation in 3D. (arXiv:2203.17003v1 [cs.LG])
    This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
    Model-based Reinforcement Learning: A Survey. (arXiv:2006.16712v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.
    STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity. (arXiv:2203.09611v2 [cs.LG] UPDATED)
    Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v1 [quant-ph])
    Parametrized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.
    Neural Q-learning for solving elliptic PDEs. (arXiv:2203.17128v1 [math.NA])
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units $\rightarrow \infty$. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in $L^2$ to the PDE solution as the training time $\rightarrow \infty$. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs.
    Graph Node-Feature Convolution for Representation Learning. (arXiv:1812.00086v2 [cs.LG] UPDATED)
    Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in the aggregation process, but features within a feature vector are still treated equally in the aggregation process. In this paper, we introduce a new convolution operation on regular size feature maps constructed from features of a fixed node bandwidth via sampling to get the first-level node representation, which is then passed to a standard GCN to learn the second-level node representation. Experiments show that our method outperforms competing methods in semi-supervised node classification tasks. Furthermore, our method opens new doors for exploring new GCN architectures, particularly deeper GCN models.
    MBORE: Multi-objective Bayesian Optimisation by Density-Ratio Estimation. (arXiv:2203.16912v1 [cs.LG])
    Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
    Learning from many trajectories. (arXiv:2203.17193v1 [cs.LG])
    We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. They do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given $n$-dimensional examples produced by $m$ trajectories, each of length $T$, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we establish that the worst-case error rate this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp) lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent altogether, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.
    SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. (arXiv:2203.16749v1 [eess.AS])
    Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.
    When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?. (arXiv:2110.04184v2 [cs.LG] UPDATED)
    Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games with $H$ steps, $S$ states, and $A_i$ actions per player. First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in $\max_{i\le m} A_i$. Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $\epsilon$-approximate Nash equilibrium within $\widetilde{\mathcal{O}}(S\sum_{i\le m} A_i / \epsilon^3)$ episodes (when only highlighting the dependence on $S$, $A_i$, and $\epsilon$), which only depends linearly in $\sum_{i\le m} A_i$ and significantly improves over existing efficient algorithm in the $\epsilon$ dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.
    When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning. (arXiv:2203.16797v1 [cs.LG])
    Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of results. In this paper, we survey an abundant number of recent works in PIML and summarize them from three aspects: (1) motivations of PIML, (2) physics knowledge in PIML, (3) methods of physics knowledge integration in PIML. We also discuss current challenges and corresponding research opportunities in PIML.
    Distributional Robust Batch Contextual Bandits. (arXiv:2006.05630v4 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    A statistical framework for efficient out of distribution detection in deep neural networks. (arXiv:2102.12967v3 [cs.LG] UPDATED)
    Background. Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn from a distribution similar to that of the training set. However, DNNs' predictions are brittle and unreliable when the test samples are drawn from a dissimilar distribution. This is a major concern for deployment in real-world applications, where such behavior may come at a considerable cost, such as industrial production lines, autonomous vehicles, or healthcare applications. Contributions. We frame Out Of Distribution (OOD) detection in DNNs as a statistical hypothesis testing problem. Tests generated within our proposed framework combine evidence from the entire network. Unlike previous OOD detection heuristics, this framework returns a $p$-value for each test sample. It is guaranteed to maintain the Type I Error (T1E - incorrectly predicting OOD for an actual in-distribution sample) for test data. Moreover, this allows to combine several detectors while maintaining the T1E. Building on this framework, we suggest a novel OOD procedure based on low-order statistics. Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.
    Causal Feature Selection for Algorithmic Fairness. (arXiv:2006.06053v2 [cs.LG] UPDATED)
    The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.
    Fast, Accurate and Memory-Efficient Partial Permutation Synchronization. (arXiv:2203.16505v2 [cs.CV] UPDATED)
    Previous partial permutation synchronization (PPS) algorithms, which are commonly used for multi-object matching, often involve computation-intensive and memory-demanding matrix operations. These operations become intractable for large scale structure-from-motion datasets. For pure permutation synchronization, the recent Cycle-Edge Message Passing (CEMP) framework suggests a memory-efficient and fast solution. Here we overcome the restriction of CEMP to compact groups and propose an improved algorithm, CEMP-Partial, for estimating the corruption levels of the observed partial permutations. It allows us to subsequently implement a nonconvex weighted projected power method without the need of spectral initialization. The resulting new PPS algorithm, MatchFAME (Fast, Accurate and Memory-Efficient Matching), only involves sparse matrix operations, and thus enjoys lower time and space complexities in comparison to previous PPS algorithms. We prove that under adversarial corruption, though without additive noise and with certain assumptions, CEMP-Partial is able to exactly classify corrupted and clean partial permutations. We demonstrate the state-of-the-art accuracy, speed and memory efficiency of our method on both synthetic and real datasets.
    Schema matching using Gaussian mixture models with Wasserstein distance. (arXiv:2111.14244v2 [cs.LG] UPDATED)
    Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
    Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle. (arXiv:2203.16668v1 [cs.LG])
    Many popular contextual bandit algorithms estimate reward models to inform decision making. However, true rewards can contain action-independent redundancies that are not relevant for decision making and only increase the statistical complexity of accurate estimation. It is sufficient and more data-efficient to estimate the simplest function that explains the reward differences between actions, that is, the heterogeneous treatment effect, commonly understood to be more structured and simpler than the reward. Motivated by this observation, building on recent work on oracle-based algorithms, we design a statistically optimal and computationally efficient algorithm using heterogeneous treatment effect estimation oracles. Our results provide the first universal reduction of contextual bandits to a general-purpose heterogeneous treatment effect estimation method. We show that our approach is more robust to model misspecification than reward estimation methods based on squared error regression oracles. Experimentally, we show the benefits of heterogeneous treatment effect estimation in contextual bandits over reward estimation.
    System Identification via Nuclear Norm Regularization. (arXiv:2203.16673v1 [stat.ML])
    This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads to new bounds on estimating the impulse response and the Hankel matrix associated with the linear system. We first design an input excitation and show that Hankel regularization enables one to recover the system using optimal number of observations in the true system order and achieve strong statistical estimation rates. Surprisingly, we demonstrate that the input design indeed matters, by showing that intuitive choices such as i.i.d. Gaussian input leads to provably sub-optimal sample complexity. To better understand the benefits of regularization, we also revisit the OLS estimator. Besides refining existing bounds, we experimentally identify when regularized approach improves over OLS: (1) For low-order systems with slow impulse-response decay, OLS method performs poorly in terms of sample complexity, (2) Hankel matrix returned by regularization has a more clear singular value gap that ease identification of the system order, (3) Hankel regularization is less sensitive to hyperparameter choice. Finally, we establish model selection guarantees through a joint train-validation procedure where we tune the regularization parameter for near-optimal estimation.
    Challenges in leveraging GANs for few-shot data augmentation. (arXiv:2203.16662v1 [stat.ML])
    In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.
    Spatially Adaptive Online Prediction of Piecewise Regular Functions. (arXiv:2203.16587v1 [math.ST])
    We consider the problem of estimating piecewise regular functions in an online setting, i.e., the data arrive sequentially and at any round our task is to predict the value of the true function at the next revealed point using the available data from past predictions. We propose a suitably modified version of a recently developed online learning algorithm called the sleeping experts aggregation algorithm. We show that this estimator satisfies oracle risk bounds simultaneously for all local regions of the domain. As concrete instantiations of the expert aggregation algorithm proposed here, we study an online mean aggregation and an online linear regression aggregation algorithm where experts correspond to the set of dyadic subrectangles of the domain. The resulting algorithms are near linear time computable in the sample size. We specifically focus on the performance of these online algorithms in the context of estimating piecewise polynomial and bounded variation function classes in the fixed design setup. The simultaneous oracle risk bounds we obtain for these estimators in this context provide new and improved (in certain aspects) guarantees even in the batch setting and are not available for the state of the art batch learning estimators.
    Towards Differential Relational Privacy and its use in Question Answering. (arXiv:2203.16701v1 [cs.LG])
    Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference is most pronounced when the data distribution is long-tailed, with many queries having only few training examples: Impeding general memorization prevents effective learning, while impeding only relational memorization still allows learning general properties of the underlying concepts. We formalize the notion of Relational Privacy (RP) and, inspired by Differential Privacy (DP), we provide a possible definition of Differential Relational Privacy (DrP). These notions can be used to describe and compute bounds on the amount of RM in a trained model. We illustrate Relational Privacy concepts in experiments with large-scale models for Question Answering.
    Wind Farm Layout Optimisation using Set Based Multi-objective Bayesian Optimisation. (arXiv:2203.17065v1 [stat.ML])
    Wind energy is one of the cleanest renewable electricity sources and can help in addressing the challenge of climate change. One of the drawbacks of wind-generated energy is the large space necessary to install a wind farm; this arises from the fact that placing wind turbines in a limited area would hinder their productivity and therefore not be economically convenient. This naturally leads to an optimisation problem, which has three specific challenges: (1) multiple conflicting objectives (2) computationally expensive simulation models and (3) optimisation over design sets instead of design vectors. The first and second challenges can be addressed by using surrogate-assisted e.g.\ Bayesian multi-objective optimisation. However, the traditional Bayesian optimisation cannot be applied as the optimisation function in the problem relies on design sets instead of design vectors. This paper extends the applicability of Bayesian multi-objective optimisation to set based optimisation for solving the wind farm layout problem. We use a set-based kernel in Gaussian process to quantify the correlation between wind farms (with a different number of turbines). The results on the given data set of wind energy and direction clearly show the potential of using set-based Bayesian multi-objective optimisation.
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v3 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.
    An energy-based deep splitting method for the nonlinear filtering problem. (arXiv:2203.17153v1 [stat.CO])
    The main goal of this paper is to approximately solve the nonlinear filtering problem through deep learning. This is achieved by solving the Zakai equation by a deep splitting method, previously developed for approximate solution of (stochastic) partial differential equations. This is combined with an energy-based model for the approximation of functions by a deep neural network. This results in a computationally fast filter that takes observations as input and that does not require re-training when new observations are received. The method is tested on three examples, one linear Gaussian and two nonlinear. The method shows promising performance when benchmarked against the Kalman filter and the bootstrap particle filter.
    Recommender Systems meet Mechanism Design. (arXiv:2110.12558v2 [cs.GT] UPDATED)
    Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.
    A Unifying Framework for Reinforcement Learning and Planning. (arXiv:2006.15009v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

  • Open

    Seeking respondents for a survey about AI text generation and reader interpretation of poetry
    Hello! I am doing an independent (non-academic) research study about AI text generation as relates to poetry and reader interpretation. The results of the study will be presented in a YouTube video. I would really appreciate if some folks could take approximately 20-25 minutes to take this anonymous survey I put together. It involves reading some poems and answering questions about those poems. Thank you so much for the help! https://form.jotform.com/220880249866062 submitted by /u/northern_frog [link] [comments]  ( 1 min )
    The Token-Dropping Approach Used By ML Researchers From Google and NYU Reduces BERT Pretraining Time And Cost By 25%
    The Pretraining of BERT-type large language models, which may scale up to billions of parameters, is essential to achieving best-in-class performance on various natural language processing (NLP) applications. However, the pretraining procedure is costly, and it has become a hurdle for the industrial deployment of big language models. In a research paper, researchers from Google, New York University, and the University of Maryland recommend a simple but effective “token dropping” method that drastically reduces the pretraining cost of transformer models like BERT while maintaining downstream fine-tuning performance. Token dropping is a technique for speeding up the pretraining of transformer models like BERT without sacrificing their performance on downstream tasks. Starting with an intermediate layer in the model, they eliminate uninteresting tokens to let the model focus on key tokens more effectively, given its limited computing resources. The model’s last layer then picks up the dropped tokens, producing full-length sequences. They use the built-in masked language modeling (MLM) loss and its dynamics to detect non-essential tokens with little computing complexity. According to their tests, this straightforward strategy decreases BERT’s pretraining cost by 25% while yielding somewhat higher overall fine-tuning performance on conventional downstream tasks. Continue Reading The Summary Paper: https://arxiv.org/pdf/2203.13240.pdf Github: https://github.com/tensorflow/models/tree/master/official/projects/token\_dropping submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Top emerging artificial intelligence use cases
    submitted by /u/Visionifyai [link] [comments]
    Meta’s new speech AI can laugh, scream, yawn, and chit-chat
    Meta unveils new research on speech AI: Machine-generated voices can now cry, laugh, yawn or make more natural small talk. ... Meta’s speech AI can now mimic emotional sounds such as laughing, yawning, or crying – which it says is important in communication to better convey the intention and context of a statement. ... the new GSML model dGSML, which is optimized for dialogs, generates more natural-sounding audio dialogs using AI agents that can pause for thought or process overlaps in conversations. ... dGSML was trained with about 2000 hours of unlabeled audio dialogues from the Fisher dataset, which contains about 16000 English-language telephone conversations. Source and demos: https://mixed-news.com/en/meta-new-speech-ai-can-laugh-scream-yawn/ submitted by /u/Sephirio [link] [comments]  ( 1 min )
    Oracle Releases MySQL HeatWave ML That Adds Powerful Machine Learning Capabilities to MySQL Applications
    Integrating machine learning capabilities to MySQL systems is prohibitively difficult and time-consuming. The process involves extracting data from the database and into another system to construct and deploy machine learning models. As data flows around, this strategy produces silos for applying machine learning to application data and causes latency. This results in data leakage, making the database more open to security attacks. Moreover, existing machine learning (ML) solutions lack the ability to explain why the model developers build delivers specific predictions. Recently, Oracle released MySQL HeatWave, the only MySQL cloud database service that supports in-database machine learning (ML). It automates the ML lifecycle and saves all trained models in the MySQL database, removing the need to migrate data or models to a machine learning tool or service. This decreases application complexity, saves costs, and increases data and model security. It produces a model with the best algorithm, features, and hyper-parameters for a specific data collection and application. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Cloudy World AI Art
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    How Chatbot is beneficial in the Retail Industry?
    By offering your clients a blended virtual, tailored, and on-the-spot conversational level of conversation - you can engage them for longer and create an interactive method of selling to them. Chatbots will be armed with Conversational AI - and powered by an in-depth understanding of the client and their past behaviors - will be able to offer the service which is most likely to suit each client. Retailers can take advantage of the ability to engage potential customers through virtual attendants. Along with this opportunity is a growing range of new and exciting ways to utilize chatbots within retail spaces. A lack of customer engagement through mobile messaging costs retailers around $1 trillion annually, as customers are increasingly becoming reluctant to contact mainstream outlets for recommendations due to their negative experiences, or because they have already made up their mind on what they want. This can be potentially overcome by utilizing Chatbots; in-store systems powered by AI that can be utilized across multiple channels, including the outlet's website and branded social media profiles. submitted by /u/botgo_io [link] [comments]  ( 1 min )
    Game AI Question (Retraining without losing characteristics)
    submitted by /u/ICouldDoButWhyWouldI [link] [comments]  ( 1 min )
    [Disco Diffusion v5] - "Bullet Time of Blood Animation"
    submitted by /u/JoshGrambo [link] [comments]
    If you could change one thing about building ML, what would it be?
    I’d like to open this question up to people who are beginners, intermediates and experienced in the field of ML to get a wide variety of perspectives. If you could change/significantly improve one thing about building ML systems, what would it be? Some examples could be: Reducing the computational overhead Reducing or eliminating the need for large datasets Simplifying the process of constructing models However, it’s not limited to just those three. Curious to see where this goes! submitted by /u/holamyeung [link] [comments]  ( 1 min )
  • Open

    A Gentle Introduction to Decorators in Python
    When working on code, whether we know it or not, we often come across the decorator design pattern. This is […] The post A Gentle Introduction to Decorators in Python appeared first on Machine Learning Mastery.  ( 13 min )
  • Open

    [P] How to add padding to an image in Pytorch?
    Hi guys, I am trying to add padding to images in Pytorch - I need to standardize all the images in my dataset to be of the same size. I spent the whole day trying to find a good solution but nothing worked. I succeeded in resizing but that compromised my image quality, so that is why I want to proceed with padding. How to do this? Thanks in advance! :) submitted by /u/whyhateverything [link] [comments]  ( 1 min )
    [P] SSO (Single Sign-On) for CVAT, the labelling tool
    Hi everyone, For quite a long time, I have seen folks looking for a way to do SSO (Single Sign-On) for CVAT, a popular labeling tool. But unfortunately such capability is not readily available. It only supports local authentication and LDAP. So we decided to make a change proactively, and now we are in the process of enabling SSO for it. The initial result looks promising. Check it out to see what we have done: https://www.youtube.com/watch?v=R7hBBLG5Fdc Is this something that you would love to have? Any other machine learning tools that you want to have SSO capability as well? Any feedback are welcome. submitted by /u/alexcgg1 [link] [comments]  ( 1 min )
    [D] Take Information Theory before the first course in machine learning?
    Hello, I will study further Reinforcement Learning and Deep Learning in the future. I have completed probability theory, linear algebra, and multivariable calculus. I am taking Mathematical Statistics. Should I take Information Theory (IT) before ML? For me, I would definitely take IT, but I don't know whether to take it now or later. submitted by /u/nwe2rw [link] [comments]  ( 1 min )
    [D] Paper Explained - Improving Intrinsic Exploration with Language Abstractions (Full Video Analysis)
    https://youtu.be/NeGJAUSQEJI Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task. ​ OUTLINE: 0:00 - Intro 1:10 - Paper Overview: Language for exploration 5:40 - The MiniGrid & MiniHack environments 7:00 - Annotating states with language 9:05 - Baseline algorithm: AMIGo 12:20 - Adding language to AMIGo 22:55 - Baseline algorithm: NovelD and Random Network Distillation 29:45 - Adding language to NovelD 31:50 - Aren't we just using extra data? 34:55 - Investigating the experimental results 40:45 - Final comments ​ Paper: https://arxiv.org/abs/2202.08938 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [N] Are we running out of AI benchmarks?
    Benchmarks are an important way to measure progress in AI research – but artificial intelligence is constantly achieving new bests. Are we running out of AI benchmarks? ... Researchers at the Medical University of Vienna and the University of Oxford now show in a meta-study of AI benchmarks that saturated or stagnant benchmarks are common. The researchers examined 1,688 benchmarks with 406 tasks in computer vision and natural language processing since 2013, and draw the following conclusions: In some cases, there would be continuous growth, such as in the ImageNet benchmark. However, a majority of all benchmarks quickly reach technological stagnation or saturation. In some cases, a lack of research interest is also a cause of stagnation. The researchers cite the UCF101 action recognition benchmark as an example of saturation. However, the dynamics of performance improvement do not follow a clearly discernible pattern: in some cases, phases of stagnation are followed by unpredictable leaps. This is what happened in the PROTEINS benchmark. ... Moreover, of the 1,688 benchmarks, only 66 percent have more than three results at different points in time – so in practice, 33 percent of all AI benchmarks are not used and therefore useless. ... In the future, new benchmarks should be developed by large, collaborative teams from many institutions, knowledge domains, and cultures to ensure high-quality benchmarks and avoid fragmentation of the benchmark landscape, the researchers conclude. Source: https://mixed-news.com/en/are-we-running-out-of-ai-benchmarks/ submitted by /u/Sephirio [link] [comments]  ( 1 min )
    [D] Methods for anomaly detection / clustering with high dimensional physics data
    I'm looking for models, workflows, algorithms in the pursuit of principled ways of conducting anomaly detection on high dimensional datasets from physical systems. I am already familiar with the application of autoencoders, isolation forests, etc. to trivial feature sets. I have feature sets that abide physical equations and so there is also the capability of using differential equations or some prior generating process to also bound what is and isn't an "outlier". Looking for papers/methods/texts that are in this vein. submitted by /u/memproc [link] [comments]  ( 2 min )
    Rejecting GAN Off-Manifold Samples? [D]
    I am working on a project, where I do image editing in the latent space of an image. Are there any papers or suggestions on how to enforce that the samples lie on the manifold? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [R] Cross-lingual Wikipedia dataset
    Hi! For a research project, I am trying to create a dataset that contains: the abstract of an article in EN; the abstract of the article in simple EN; the rest of the article in EN; the rest of the article in simple EN. When I worked in one language, I preprocessed the XML directly (the APIs seemed quite slow for processing the whole encyclopedia). However, I am struggling to find a way to join Wikis in different languages, as the dumps seem not to include a language-independent id. This seems to be a relatively "standard" task for creating cross-lingual datasets, so I hope someone has some tips, and I do not need to spend the next week reinventing the wheel :) submitted by /u/ombelicoInfinito [link] [comments]  ( 1 min )
    [D] Activation functions for Neural Networks in Time Series
    Hi everyone, I want to run a feedforward autoregressive network to forecast the potential sales of specific SKUs for the following months. The idea is to capture the non-linearity in the data. Does it make sense to use ReLU? or given that all my data points are positive values this function will return the same number (max(0,x)), therefore is not suitable for what I am trying to do? I have also checked other activation functions, but sigmoid for example is for classification, and hyperbolic tangent returns values that can be negative. ​ Any help would be much appreciated Thanks! submitted by /u/Old-Box-6684 [link] [comments]  ( 1 min )
    [D] I just watched AlphaGo - The Movie, are there any more well made documentaries about AI available?
    AlphaGo - The Movie on Youtube submitted by /u/Mighty__hammer [link] [comments]  ( 1 min )
    [P] Cyclemoid -- a new activation function inspired by cyclical learning rates; SOTA on several benchmarks
    Excited to share our latest research. The cyclemoid activation was inspired by the success of cyclical learning rate. Moreover, it has nice mathematical properties to stabilize gradients and maintain strong gradient signals in desired regions during training. We designed it as a drop-in replacement for ReLU, and we would love to hear what you think. The code is up on GitHub, and the preprint should be up soon, too: https://github.com/rasbt/cyclemoid-pytorch PS: Currently, we only have a PyTorch implementation but would welcome it if someone could port it to TensorFlow/Keras (my Tf/Keras skills are just too rusty for it.) submitted by /u/seraschka [link] [comments]  ( 2 min )
    [D] I'm creating a tool to enrich your datasets with relevant external data
    Hey all, I love doing market research and all kinds of exploratory analyses, but getting the data is a major pain point, as it is in many places (data dumps, apis, marketplaces, web data) and in all kinds of formats I'm trying a different approach, where instead of searching for data sources, and then integrating manually, you just upload your dataset. My service has a large index with datasets and api providers, and finds relevant ones for your dataset which you can add easily. ​ example search via sdk Does this seem useful to you? Would love to hear your thoughts submitted by /u/salmiakdrop [link] [comments]  ( 2 min )
    [D] Have there been successful applications of Deep RL to real problems other than board games/Atari?
    Some successful applications I am able to gather, mostly from Deepmind: - comma.ai self-driving system. - Weather nowcasting - Tokamak fision reactor feeedback control - Hardware design: New gen. TPU - Datacenter cooling optimization - Adaptive locomotion for quadrupedal robots * - Portfolio optimization (financial instruments) There is a lot of work in games, particularly board games, but these do not really solve something "useful" for society. I have seen also lots of toy examples with libraries like gym and some robotics but in general these are rather proof-of-concept models or just models that do not work at all. One that actually does work is Solving Rubik’s Cube with a Robot Hand, not regarding the solution of the cube but its dexterous manipulation with a robotic hand. This is pretty cool, but again, the domain of the problem is too narrow to be considered actually a successful application to a real-world problem. So my question is, am I missing some examples? For example, is any company out there trying to apply deep RL to self-driving vehicles or to NLP, and have they had any success? * Boston dynamics solves this without ML, just good'ol control theory so this is a 50-50 win for RL. ​ Edit: Thanks everyone for the responses, I have updated the list with more projects from the comments. Edit 2: Took Alphafold out because the current version (2.x) does not use RL. submitted by /u/sid_276 [link] [comments]  ( 4 min )
    [D] Request for Advice: Text-to-Image Synthesis
    Hello all, I hope that you are all doing good. The thing is that I want to study text-to-image synthesis; but I see that there are lots of work already done up until now. I am kind of confused about how I should start studying. About my experience, I am trying to learn as much as I can. I started by following Andrew Ng’s Machine Learning course on Coursera; and now I continue with Deep Learning Specialization (almost finished). Apart from these, I check NVIDIA blog, some YouTube channels, Slack communities and of course here on Reddit along with a few other channels. As I said, I am trying to follow what’s going on. To tell the truth, I didn’t do much in terms of building models. I mean, not a real-life project or something like that. My experience is more based on projects for the courses that I attend/attended. By the way, I don’t know how it will affect things’ turning out; but I have recently begun a graduate program as well in artificial intelligence. Moreover, during my undergraduate program, I took some courses that might help (at least I guess so) including Calculus and Linear Algebra. I also want to mention that as for hardware there is an NVIDIA Jetson Nano Developer Kit available to me. I hope that what I am asking is clear and information I provided help you answering. If not, please ask me. Best regards. submitted by /u/fgokmenoglu [link] [comments]  ( 1 min )
    [D] Do you use data engineering pipelines for real life projects?
    Usually I am working on huge and complex datasets with millions of rows (or images if it's CV) and most often I just feed them to pandas in a notebook, then transfer the code to a script and run it when it's needed. Then with the result I train my models. No external tools used for this. Do you have experience with data pipeline tools/frameworks and data validation tools/frameworks? For example I just found "Great Expectations" and "Kedro", "Flyte" and I was wondering at which point in time and project complexity should we choose one of these tools instead of the ancient cave man way? Any success/failure stories? submitted by /u/gabegabe6 [link] [comments]  ( 5 min )
    [D] The Joy of Finding Things Out (essay)
    Currently I’m trying to figure out if I want to stay in academia or not. My decision hinges on what makes doing science enjoyable. My current understanding is in order for science to be enjoyable, there has to be an element of surprise. A tension that builds. And release of that tension. This is most obvious in theoretical works where the scientist makes a prediction, and later empirical data verifies that prediction. Excitement. Joy. Wonder. In experimental work, surprise can take the form of not knowing how the experiment will turn out. Once you get the result of the experiment, it'll disambiguate competing theories you had in your mind or elucidate a new theory. "Everything clicks" or at the very least you'll be put into a fever trying to integrate the new surprising data with previous …  ( 10 min )
    [D] A fine grained classification dilemma
    Hi Folks, I'm in a bit of a dilemma here on a specific fine grained classification problem that we have. The task at hand is to classify the subspecies of plant seeds. Firstly, the seeds are super tiny, but we have camera setup to take zoomed images of the seeds. Secondly, even the experts can't tell the exact difference between 2 subspecies of seeds. They go by their experience and intuition. All the subspecies put together looks similar but the data points within the same subspecies are vastly different. The requirement being classifying the subspecies based on their morphological properties. Here is the catch, the requirement is 98%+ accuracy. I tried few preprocessing and models (transformers, resnets, inception and other sort) but can't hit the 98% accuracy mark. Even if I could, the model sometimes fail on external sets or on production. I would like some expert (ML side) take on this issue on how to approach this. submitted by /u/happy_happy_feet [link] [comments]  ( 4 min )
    [D] how to preprocess a 13k column dataset.
    I have a single cell rna seq dataset containing 13k features. I would like to preprocess the dataset. What are the best methods to do that? Also, how to apply feature elimination/selection on this unsupervised data? Thanks submitted by /u/Striking-Machine2763 [link] [comments]  ( 1 min )
    [R] Nix-TTS 🐤: An incredibly lightweight text-to-speech via non end-to-end distillation
    Hi, Reddit! Excited to share with you guys, Nix-TTS 🐤! Our latest research in lightweight neural Text-to-Speech. We've seen how synthetic voices generated by recent neural TTS are getting more and more natural, but most of the time the models suffer from slow CPU inference and are not end-to-end, which requires an additional vocoder model. Nix-TTS 🐤 is an incredibly lightweight end-to-end TTS model achieved by applying non end-to-end knowledge distillation to a powerful yet large-sized generative TTS teacher model. Our proposed model is end-to-end (vocoder-free) with only 5.23M parameters or up to 82% reduction of the teacher model. We also employed a stochastic duration predictor to improve its expressiveness. It is capable to run 10x faster than real-time on Intel i7 CPU and 0.5 times faster than real-time on Raspberry Pi Model 3B. Making it suitable for deployment in resource-constrained settings. Here we attached the complexity and speedup detail from the paper. ​ Nix-TTS speedup and complexity compared to other models. ​ We released the paper (submitted to INTERSPEECH 2022) and the pre-trained models on the attached link below: 📄 Paper: https://arxiv.org/abs/2203.15643 📦 Repository: https://github.com/rendchevi/nix-tts 🤗 Interactive Demo: https://huggingface.co/spaces/rendchevi/nix-tts A short video demo from the 🤗 HuggingFace Spaces: Nix-TTS Short Demo Let me know what you guys think in the thread! We're very excited to see the potential improvements & applications of this model or method and lightweight TTS in general. Feel free to reach me via DM as well if you'd like to discuss anything further. submitted by /u/sourpeach_ [link] [comments]  ( 2 min )
  • Open

    [D] Current algorithms consistently outperforming SAC and PPO
    Hi community. It has been 5 years now since these algorithms were released, and I don't feel like they have been quite replaced yet. In your opinion, do we currently have algorithms that make either of them obsolete in 2022? submitted by /u/yannbouteiller [link] [comments]  ( 1 min )
    Need Help with Project Idea
    Hey guys! So I am enrolled in a reinforcement learning course at my university, and I am really confused about a decent project idea. Primarily, I want to work on any game based environment apart from atari ones. Using unity seems promising but not sure if that is easy to pull off. Any suggestions to get me started? Thanks submitted by /u/ishon_p [link] [comments]  ( 1 min )
    [D] Completely removing the option taking illegal actions in custom gym environment
    Hi, I have a two step custom gym env which is a graph network optimisation task (two steps due to high dimension action space). In the first step the agent chooses the first node and state reward passed to training. in the second step, the agent chooses the second node, and with a class attribute holding the history of the first selected node, now has a node pair which it is then able to remove or add and edge. The training loop now has a new graph state (edges changed between nodes) plus a vector of the first action selected. The agent is able to learn well however, even with a negative reward provided to the agent if it takes illegal actions (choosing the same node in each step and thus creating a self connection), even when it learns to maximise reward, it still take illegal actions…  ( 2 min )
    Training drone via DRL to hover with just camera sensor
    Hi Everyone This is my first time trying reinforcement learning and Unity and wanted some help. I am trying to train a quadcopter to hover and keep a stationary ball in its camera frame using just visual inputs by a camera sensor. So the input to the network would be an 84*84 grayscale image and the output would be the four forces to the rotor. My reward function is ​ Reward Function where x and y is the position of the drone and a is the rotation of the drone with respect to the target. I have set A, d and c to 3 and lambda as 1/180. I have also added a condition where if the quadcopter drops below a certain height from the platform, it punishes it by a reward of -1 and resets the episode. The network I am using is the ppop network used by the coin-collector example in mlagents. My training log is below: Training log The cumulative reward and episode length just drops after a while and the value loss explodes. I think something maybe wrong with my rewards or network. If anyone has any ideas what might be going wrong and tell me, that would be great. Thanks submitted by /u/voyager10 [link] [comments]  ( 1 min )
    Do you know any port of StableBaselines 3 to C++?
    Has anybody done it already? submitted by /u/Live_Medium_3949 [link] [comments]
    Is there a way to get PPO controlled agents to move a little more gracefully?
    submitted by /u/user_00000000000001 [link] [comments]  ( 2 min )
    How to make two policies in TRPO, PPO algorithms?
    In both TRPO and PPO, we have r which is ration of new_policy/old_policy. Here we collect data from old_policy and improve new policy. I am confused on how to implement this. Correct me If I am wrong but I have two ways in mind. 1) I collect prob while running simulation. When optimizing, I use same neural network to sample another action and its prob, this new and its probability becomes my new_policy and then I optimize for L_clip function. 2) I collect prob while running simulation. Before optimizing the ppo objective, I first run a simple policy gradient by using the original prob I collected. After updating the NN, I once more get new_prob which I use in L_clip function. Can someone please tell me which should I do and why? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Intrinsic Curiosity Module Pytorch multithreading cpu unable to fix seeds
    Hello I am working on an extension of this implementation https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning/ICM of the intrinsic curiosity module. It uses A3C(Actor -critic) as a policy and the ICM is a bolt on module. I need to fix the seeds for reproducibility but no matter what i have tried I cannot achieve it. The implementation uses multithreading on cpu and plays on the oepnai gym cartpole or atari environments. I believe that it has something to do with multithreading but im not sure. Could someone know what is the solution? submitted by /u/Formal-Drawing-8421 [link] [comments]  ( 1 min )
    Is replay buffer can remove "done"?
    Hi, These days, there are lots of implementation without next state and done for memory like drq-v2 official implementation. But, I have a question about is it okay to throw out "done" in replay buffer. In my point of view, there are some problems about done related signal. or did I read implementation code wrong? submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Do policy gradient methods also require some mechanism for exploration?
    Algorithms like A2C, A3C, TRPO and PPO use a stochastic policy, i.e. the actions are sampled from a probability distribution so the exploration should be done inherently by these algorithms. Yet, when I am using PPO to train a bipedalwalker, it seems like some exploration mechanism is required because during the training process reward first go up and then after 1K episodes there is no progress. Please suggest me what can I do stop this from happening. submitted by /u/Better-Ad8608 [link] [comments]  ( 2 min )
  • Open

    Generating new molecules with graph grammar
    An efficient machine-learning method uses chemical knowledge to create a learnable grammar with production rules to build synthesizable monomers and polymers.  ( 6 min )
  • Open

    Whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences
    For customers looking to implement a GxP-compliant environment on AWS for artificial intelligence (AI) and machine learning (ML) systems, we have released a new whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences. This whitepaper provides an overview of security and good ML compliance practices and guidance on building GxP-regulated AI/ML systems using AWS […]  ( 3 min )
  • Open

    Introducing CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
    Posted by Ye Jia and Michelle Tadmor Ramanovich, Software Engineers, Google Research Automatic translation of speech from one language to speech in another language, called speech-to-speech translation (S2ST), is important for breaking down the communication barriers between people speaking different languages. Conventionally, automatic S2ST systems are built with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, so that the system overall is text-centric. Recently, work on S2ST that doesn’t rely on intermediate text representation is emerging, such as end-to-end direct S2ST (e.g., Translatotron) and cascade S2ST based on learned discrete representations of speech (e.g., Tjandra et al.). While early vers…  ( 9 min )
  • Open

    AI-generated pranks for your computer to play on you
    I've tried various methods of using AI to generate April Fools pranks for you to play on other people (although often they turned out to be pranks you play on yourself). But this is the first time I've tried to generate pranks for a computer to  ( 4 min )
    Bonus: Ada's pranks
    AI Weirdness: the strange side of machine learning  ( 1 min )

  • Open

    [D]Are there any good solutions for multimodal classification? Libraries, AutoML tool?
    Hi, Reddit! I'm a data scientist working in the dating app startup field. We have been trying to set up people with blind dates. We can get multimodal data(texts, images, and audio) and we want to do this: collect negative and positive pairs from each user's swiping history and do a binary classification (matching). Then we would get together as many users' data as possible and train a model. Though it sounds nice, this is hard as we could not find an existing developer tool or paper that supports combining such rich multimodal data. Any existing research out there to achieve this task? Moreover, is there already any library or AutoML tool that supports this? Checked Google AutoML, does not support this. Any help and advice would be much appreciated. submitted by /u/meame2010 [link] [comments]  ( 1 min )
    [P] LAION-5B: public dataset of 5.85 billion image-text pairs
    LAION-5B: A new era of open large-scale multi-modal datasets. Twitter thread. Related: [P] LAION-400M: open-source dataset of 400 million image-text pairs. I am not affiliated with this project. submitted by /u/Wiskkey [link] [comments]
    [P] Adapting pixel attribution methods for models that output embeddings
    Hi r/MachineLearning, https://github.com/jacobgil/pytorch-grad-cam is a project that has a comprehensive collection of Pixel Attribution Methods for PyTorch (like the package name grad-cam that was the original algorithm implemented). Typically pixel attribution methods are adapted for classification: they let you understand what part of the image correspond to a certain classification category. However some deep learning models output embeddings instead of category scores.You can then match these embeddings against other embeddings and measure their similarity. For example: in face recognition models, or in self supervised networks. In this case to apply pixel attribution, we could create embeddings of concepts, and then for new query images we would be asking: "what parts of the image have feature representations that match the concept features?" Or in other words: "where in the query image do we see the concepts?" ​ I wrote a tutorial that shows how to use the pytorch-grad-cam project, to adapt pixel attribution for the embedding case, and visualize where different concept feature representations match the image: https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb An example is the image below. The two left images are "concept" images of clouds, and a car. Then given a new query image, we can try to see where in the image do we feature representations that match these concepts. Given images of concepts and a query image, attribute what parts of the query image match the concepts I hope someone finds this useful ! submitted by /u/jacobgil [link] [comments]  ( 1 min )
    [D] Does anyone know how to create animations like in the Google AI Blog?
    I really would like to do some visualization of my ideas. I found the animation in the google ai blog: https://ai.googleblog.com/2022/02/4d-net-learning-multi-modal-alignment.html Anyone knows how to do this stuff, especially with the flowing lines? Any software suggestions? submitted by /u/KonArtist01 [link] [comments]  ( 1 min )
    [D] Data centric fixes to a model that's fit to a spurious correlation
    Say a computer vision model learns a pattern you don't want it to, you know that it's learnt it because of analysis through tools like occlusion sensitivity map. What data-centric techniques can you use to resolve it? Could some form of cropping augmentation do the trick?Classic example is ruler beside melanoma, while there may be a correlation between presence of ruler and presence of melanoma you don't want to the model to depend on that information because it may not exist 'in production'. Below is a quotation describing another similar problem. "In another paper a similar issue was found because doctors sometimes use purple markers to highlight potentially-malignant skin cancers for easier examination. Some argue that the purple marks are a real signal that should be incorporated in the model just as the visual appearance of the tumor itself is incorporated. However, if your goal is robust generalizability over time it is probably best to not have your AI incorporate the human applied purple marks as signal, as the standards for applying those marks may vary across teams and across time." https://menloml.com/2020/01/11/recognizing-a-ruler-instead-of-a-cancer/ f you're working with that dataset, what tools are available to you to solve that problem? submitted by /u/Georgehwp [link] [comments]  ( 1 min )
    [P] Marketing Mix Modeling - How can we solve for negative Media Coefficients?
    Hi everyone I'm working on a Marketing Mix Modeling project for a client I'm using Python and sci-kit learn library to do regression analysis with Ridge and Linear Regression. I have pretty good results: R^2=0.87 mape= 0.2 But some of my media coefficients are negative And this doesn't make sense business-wise How can I model positive media coefficients without using Bayesian modeling? submitted by /u/datagabriele [link] [comments]  ( 3 min )
    [R] projUNN: efficient method for training deep networks with unitary matrices
    Paper: https://arxiv.org/abs/2203.05483 TL;DR; from LeCun: Recurrent nets in which the weight matrix is unitary are interesting beasts: they are invertible, they don't suffer from vanishing/exploding gradient, and they perform computation akin to what happens in quantum computers. training them can be difficult or expensive. We propose a low-rank (low-cost) update method to update unitary weight matrices with gradient descent. Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates -- or their rank-k approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms. submitted by /u/lostmsu [link] [comments]  ( 1 min )
    [D] Are text embeddings tabular data?
    I keep hearing that NNs are not the best way to approach tabular data. But when it comes to document classification, in terms of using embeddings for a downstream classification task, would that be considered tabular data? You will end up with data that fits in a table that you wish to classify...it's high dimensional but you could reduce dimensions until you end up with just a smaller set of columns and the labels. I guess I'm unclear about what defines tabular data in this context, and if it makes sense to use a different model (like XBGboost) for the classification task, vs having it as a final layer in the embedding network. submitted by /u/bandalorian [link] [comments]  ( 3 min )
    [R] Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
    Paper: https://arxiv.org/abs/2109.00725 Abstract: A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community. submitted by /u/bikeskata [link] [comments]  ( 1 min )
    [P] AIY Vision Kit
    Working on a group project for college, utilizing a AIY Vision kit and implementing some variation of machine learning with it. The initial idea was to use the camera to identify playing cards and sorting them into groups, like different suits, odd/even, face and non face, but after meeting with the professor in charge, was informed that this idea of identifying and sorting is not really marketable. I tried explaining that sorting in the different ways could be used in many different applications and businesses, but still was told it was not really marketable. I was wondering if maybe there was a way to tweak it a bit to make it so, or if maybe turning it into an API would help. Any ideas and/or opinions on the project would be very helpful. Thank you very much. I understand this may fall under beginner related questions and project, but was curious to know if my group and I are on a right track or not, as we haven’t had any sort of guidance up to date. submitted by /u/EETQuestions [link] [comments]  ( 1 min )
    [D] What kind of teams do you work on?
    I am looking to stand up a small ML group in my non big tech but pretty big organization. We have a few use cases, only see the list growing, and would like to have a dedicated group rather than having disparate teams each trying to roll their own algorithms for their own applications. I am starting to look at what the costs might be, and I can see some pushback and lowballing (eg onshore/offshore, skill set, % junior/senior), but I don’t have a lot of stories to start my thoughts with, let alone data. So, I’m interested to know what your teams are like to start thinking about it, and any other data or literature would also be appreciated! submitted by /u/Stranger_Dude [link] [comments]  ( 1 min )
    [R] China researches “brain-scale” AI
    https://mixed-news.com/en/artificial-intelligence-china-researches-brain-scale-ai/ In China, the state and companies are researching AI models with trillions of parameters. They want to prove that they can develop “brain-scale” AI. In the race to build ever-larger AI models, China is showing that cooperation between the state, universities and the private sector holds the potential for gigantic AI models. The researchers are talking about “brain-scale” AI: according to their definition, these are AI models with parameters beyond the 100-trillion mark. ... In a new paper, researchers from Tsinghua University, Alibaba Group, Zhejiang Lab and Beijing Academy of Artificial Intelligence present BaGuaLu, a framework that enables the training of large AI models using the Mixture-of-Experts (MoE) architecture. ... In an initial test, the researchers trained a 1.93 trillion model with their framework, outperforming Google’s Switch Transformer. They also demonstrate that their framework enables models with 14.5 trillion and a full 174 trillion parameters. ... BaGuaLu could soon be used to train the first models beyond 100 trillion parameters. submitted by /u/Zirius_Sadfaces [link] [comments]  ( 2 min )
    [D] Coreset terrible perfomance on Datasets with a lot of redundancy
    Hello reddit hivemind, This might be a quite specific question but im not sure where else to go and ask. I'm currently an intern and charged with implementing and comparing different active learning algorithms to see which work best for our specific usecase. Since the coreset approach ( [1708.00489] Active Learning for Convolutional Neural Networks: A Core-Set Approach (arxiv.org) ) is now around for a long time, one of the best documented and shows promising results in a variety of papers I implemented it and ran some experiments with it. The results were a bit dissappointing. It got even outperformed by the random baseline .... To understand the bad performance I dug a bit deeper since I spend a significant amount of time implementing it. What I made out now as the issue is using the l_2 norm of penultimate layer as the metric. This leads to an oversampling of data samples with a certain softmax output due to the way the softmax function behaves. Has someone experienced the same issues? The only point where I could see coreset to be of some use is with a dataset that has a ton of redundant/similar images. Thank's a lot submitted by /u/Fearless-Pumpkin-745 [link] [comments]  ( 1 min )
    [R] Reinforcement Learning in Finance research
    Hello, I hope that this message finds you in good health FinRL: Deep Reinforcement Learning for Quantitative Finance https://github.com/AI4Finance-Foundation/FinRL is a project from Columbia University. It offers environments for cryptocurrency, paper trading, stock trading, and forex trading. Also, it has support for three reinforcement learning libraries: Stable Baselines3, RLlib, and ElegantRL. This is from AI4Finance-foundation and it aims to provide a plug-play platform for RL in finance. Do check it out and help us to improve this project Some resources: My contributions: https://medium.com/@athekunal/list/finrl-contributions-59de6997c5b1 Resources to learn FinRL: https://github.com/AI4Finance-Foundation/FinRL#tutorials All tutorial notebooks: https://github.com/AI4Finance-Foundation/FinRL/tree/master/tutorials YouTube Channel: https://www.youtube.com/channel/UCrVri6k3KPBa3NhapVV4K5g submitted by /u/A_the_kunal [link] [comments]  ( 1 min )
    [P] Point Cloud Annotation Tool
    Hi Everyone, Some time ago, we had the idea to start building tools to facilitate 3D computer vision development. We started by looking at some of the 3D tools out there and realized that there wasn't anything that fit our needs or could be extended to do some of the things we wanted to do. We started working on a tool to annotate point clouds with different label types (bounding box, rectangles, keypoints) to use as a base for our projects. We recently open sourced the tool, which you can find here: https://github.com/StrayRobots/3d-annotation-tool In the future we might add more tools, for example to paint point clouds or a polygon label type. We would love to hear your feedback on the tool. Has anyone here built any 3D vision datasets? What kind of tools did you use? submitted by /u/slash-dot [link] [comments]  ( 1 min )
    [D]Is it just me or is machine learning difficult to learn?
    Hello, My Background: I work as a web dev for almost 2 years. Before that when I was studying in college, I thought ML is the only field which was in demand. I put my 100% into it but the professor was so bad that not only me but a lot of my peers found ML,DS to be very difficult. We were able to built project around but never tried to learn more. I tired many udemy or coursea courses but never found it engaging. Is it just me or did you also found ML difficult? Is my approach to learning it wrong? If anyone has any advice I'd really appreciate it submitted by /u/Notalabel_4566 [link] [comments]  ( 3 min )
    [D] How to combine multimodal data of different sequence lengths in training?
    I'm currently working on a project related to multimodal summarization using transformers. My input is a image and long text and output will be a summary of the text pertaining to the image. ​ For extracting image features, I'm using pretrained resnet model. It gives me a [49 * 2048] vector for an image. For extracting paragraph features, I'm getting embeddings for each sentence, so the data_dimension will be [no_of_sentences * 512] ​ I need to attend to both these set of features and generate output. I have gone through tutorials to understand the working of transformers but couldn't understand how to combine these into a single input so that the encoder can attend over both image and the paragraph together at the same time. ​ Any pointers to tutorials will be very helpful. submitted by /u/abisekrk [link] [comments]  ( 1 min )
    [N] An open letter to DeepMinders
    Yesterday, an anonymous open letter and accompanying Financial Times (paywall) article were published accusing DeepMind of mishandling allegations of sexual abuse by a senior member of the research team. The FT article (reported on here in Fortune) goes on to suggest that the mishandling may have been deliberate in order to exploit legal loopholes in the UK where victims have a limited amount of time to take a case to employment tribunal. This comes shortly after Mustafa Suleyman, one of the cofounders was quietly shuffled out to Google (he has subsequently left and founded a new startup with another DeepMind alum) after he was found to have bullied and humiliated staff for years. Google itself also has a poor record when it comes to sexual harassment, bullying and retaliation at the highest levels resulting in payouts of hundreds of millions of dollars. Given that DeepMind and Google have a pretty strong grip on the development of AI in terms of employing many of the key people across many of the various subfields, having access to unparalleled data and compute and pushing forward more and more into health (for example the DeepMind offshoot Isomorphic Labs which itself is headed by Demis and staffed by DeepMinders, and the various Google healthcare bets and projects), can we really trust them to be stewards of fair and responsible AI development? Bad things happen in all large organizations. But DeepMind isn’t that big and in the past five years, DeepMind leadership have presided over a steady stream of sexual harassment, bullying and other scandals and handled them all extremely poorly and showed little signs that things have changed. This points to something rotten in the culture and leadership there and at it’s parent organization. submitted by /u/ml-anon [link] [comments]  ( 3 min )
    [D] Use of Kesler Construction for Multi-class Prediction
    I noticed that in some literature they use the Kesler construction when discussing multi-class prediction: https://uclanlp.github.io/CS269-17/slides/CS269-03.pdf. Why do they do this versus represent all the w_i vectors in a K x N matrix, where N is the length of x and K is the number of classes, and then generate Wx = [w1^Tx, ..., wKx], which will essentially produce the same result but be more efficient because of the lack of zero multiplications which are in the Kesler construction? submitted by /u/newperson77777777 [link] [comments]  ( 1 min )
  • Open

    Prepare data from Databricks for machine learning using Amazon SageMaker Data Wrangler
    Data science and data engineering teams spend a significant portion of their time in the data preparation phase of a machine learning (ML) lifecycle performing data selection, cleaning, and transformation steps. It’s a necessary and important step of any ML workflow in order to generate meaningful insights and predictions, because bad or low-quality data greatly […]  ( 9 min )
  • Open

    The Myth of Analytic Talent Shortage
    I tested the job market in the last two weeks, both as an applicant, and as a hiring manager. I share my experience here. It is radically different from what you read in the news, or from what most people say. Data scientists and machine learning engineers looking for a new job are out there.… Read More »The Myth of Analytic Talent Shortage The post The Myth of Analytic Talent Shortage appeared first on Data Science Central.  ( 6 min )
  • Open

    Policy Gradients with pytorch
    what will the shape of the tensor "probs" (marked) look like ? will it look like this - [ 0.32,0.40,0.28] ? ​ or like this ? [ [ 0.32,0.40,0.28], [ 0.32,0.40,0.28], [ 0.32,0.40,0.28], [ 0.32,0.40,0.28] ] https://preview.redd.it/3qiq3lzz6sq81.png?width=829&format=png&auto=webp&s=c15ece0e38b9dabf3f443466b2553e146845e92a submitted by /u/Whole_Run_4485 [link] [comments]  ( 1 min )
    What are the main roads to AGI?
    I was wondering if you can help me come up with a list fo all the specific proposals on how to achieve AGI. For example, one of them is: scaling is all you need. In more detail, scaling self-supervised pretrained deep network models (a.k.a. foundation models), data and compute can lead to AGI (scaling assumes "smart" one, i.e. as steep as possible/cost-efficient exponents in neural scaling laws). Do you know what other main roads there are to AGI? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Sparse Reward Environments and Off Policy Algorithms
    In my experience, (continuous action space) off Policy algorithms are generally more sample efficient than on policy algorithms but don't perform as well on sparse rewards environments. Are there any papers that address this issue? Do you know of any algorithms that are both sample efficient and learn sparse reward Environments well? submitted by /u/SirRantcelot [link] [comments]  ( 1 min )
    How to deal with delayed, dense rewards
    I'm having a doubt that may be a little stupid, but I ask to be sure. Assume that in my environment rewards are delayed by a random number n of steps, i.e. the agent takes an action but receives the reward n steps after taking that action. At every step a reward is produced, therefore the reward r_t in transitions s_t, a_t, r_t, s_{t+1} collected by the agent is actually the reward corresponding to the transition at time t-n. An example scenario: the RL agent control a transportation network, and a reward is generated only when a package reach its destination. Thus, the reward arrives with possibly several steps of delay with respect to when the relevant actions were taken. Now, I know that delayed rewards are not generally an issue, e.g. all those settings in which there is only one reward +1 at the end, but I am wondering if this case is equivalent. What makes me wonder is that here, for a state s_t onwards to state s_{t+n}, there are n rewards in the middle that depend on states previous to s_t. Does this make the problem non-markovian? How can one learn the value function V(s_t) if its estimation is always affected by unrelated rewards r_{t-n} ... r_{t-1}? submitted by /u/fedetask [link] [comments]  ( 2 min )
    Sim2Real
    Hi! Does anyone know the “right” way to apply a policy to a robotic manipulator? Now I’m trying creating a real environment and simulate the policy on it but I can’t find anything on the web about this. Thanks! submitted by /u/Big-Picture8323 [link] [comments]  ( 1 min )
    Pass a seed arg in gyms reset method to play the same game - undocumented feature!
    I asked a while back how to save state of an episode when this was my original intention. even a quick perusal of an environment's code reveals interesting information that's helpful to using gym as a whole https://www.github.com/openai/gym/tree/master/gym/envs I think it'd be interesting if papers supplied seed numbers for their test andntraining runs, where they're pulled from an array of ints contained in the agent. submitted by /u/clockface99 [link] [comments]  ( 1 min )
    [R] Reinforcement learning in Finance project
    Hello, I hope that this message finds you in good health FinRL: Deep Reinforcement Learning for Quantitative Finance https://github.com/AI4Finance-Foundation/FinRL is a project from Columbia University. It offers environments for cryptocurrency, paper trading, stock trading, and forex trading. Also, it has support for three reinforcement learning libraries: Stable Baselines3, RLlib, and ElegantRL. This is from AI4Finance-foundation and it aims to provide a plug-play platform for RL in finance. Do check it out and help us to improve this project Some resources: My contributions: https://medium.com/@athekunal/list/finrl-contributions-59de6997c5b1 Resources to learn FinRL: https://github.com/AI4Finance-Foundation/FinRL#tutorials All tutorial notebooks: https://github.com/AI4Finance-Foundation/FinRL/tree/master/tutorials YouTube Channel: https://www.youtube.com/channel/UCrVri6k3KPBa3NhapVV4K5g submitted by /u/A_the_kunal [link] [comments]  ( 1 min )
  • Open

    Featured video: L. Rafael Reif on the power of education
    At Monterrey Tec, MIT’s president discusses the impact of education in addressing global issues.  ( 3 min )
    Solving the challenges of robotic pizza-making
    A new technique could enable a robot to manipulate squishy objects like pizza dough or soft materials like clothing.  ( 7 min )
  • Open

    Last Week in AI podcast: DeepMind Mafia, DishBrain, PRIME, ZooKeeper AI, Instant NeRF
    submitted by /u/regalalgorithm [link] [comments]
    Newbie question
    Hello guys, I was wondering if anyone knew the easiest way to combine images together. Ideally I would have a bunch of images and it would take components of a couple (or just two) and put them together. I want to generate images of morphed anime figures, it doesn’t even need to look professional (or good lol). Just need some sort of website or software that I can easily achieve this. Any tips or ideas would be greatly appreciated!! Thank you! submitted by /u/misakimeifanpage [link] [comments]  ( 1 min )
    Fighting AI's discrimination in mortgage lending | DualFair
    submitted by /u/qptbook [link] [comments]
    Researchers from U Texas and Apple Propose a Novel Transformer-Based Architecture for Global Multi-Object Tracking
    Multi-object tracking aims to locate and track all objects in a video feed. It’s a fundamental component in domains like mobile robots, where an autonomous system must navigate dynamic surroundings populated by other mobile agents. Thanks to breakthroughs in deep learning and object detection, tracking-by-detection has become the dominant tracking paradigm in recent years. Tracking-by-detection simplifies the process by reducing it to just two steps: detection and association. First, an object detector searches each video stream frame for probable items. The second phase is an association step, which connects detections over time. Local trackers are greedy when it comes to pairwise relationships. They keep track of each trajectory’s state based on its position and/or identity traits and correlate current-frame detections with it based on its last visible status. Continue Reading The Research Summary Paper: https://arxiv.org/pdf/2203.13250.pdf Github: https://github.com/xingyizhou/GTR https://i.redd.it/312ahhaxtqq81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI generated Personalized Implicit Neural Avatars (PINA)
    submitted by /u/imapurplemango [link] [comments]
    Instant NeRF: Turn 2D Images into a 3D Models in Milliseconds
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Dataset with labeled benign and malicious files
    Hi, Reddit, During the project implementation for my bachelor's thesis [1], a software (named dike, as the Greek goddess of justice) capable of analyzing malicious programs using artificial intelligence techniques, I was unable to locate an open source dataset with labeled malware samples in the public domain. As a result, I created DikeDataset, a dataset with labeled PE and OLE samples [2]. Because it was not the main focus of my thesis, the samples attributes are not evenly distributed (the benign-malicious and OLE-PE ratios are quite low), but the dataset aided greatly in the research process. This week, I was surprised to see that the public GitHub repository (which was used only for storage, without any promotion on communities like this) gained some organic reach (views, clones and stars). Furthermore, I was thrilled to learn that it was used in a research article published in 2021 [3]! As a result, I'd like to share this project with the community in the hopes that it will be useful to some members of the community. [1] dike [2] DikeDataset [3] Toward Identifying APT Malware through API System Calls submitted by /u/iosifache [link] [comments]  ( 1 min )
    Disney princesses according to AI. Is this done manually or through an AI app?
    submitted by /u/cyberpunk1Q84 [link] [comments]  ( 1 min )
    A-List Celebrities Read my movie script
    Made this video tonight I had A.I. voices read my script for an upcoming movie. https://www.youtube.com/watch?v=RkK-iGAGcHA submitted by /u/PapermoonPictures [link] [comments]
  • Open

    Jigsaw fixes bugs in machine-written software
    Large pre-trained language models such as GPT-3, Codex, and others can be tuned to generate code from natural language specifications of programmer intent. Such automated models have the potential to improve productivity for every programmer in the world. But since the models can struggle to understand program semantics, the quality of the resulting code can’t […] The post Jigsaw fixes bugs in machine-written software appeared first on Microsoft Research.  ( 8 min )
    Just Tech: Centering Community-Driven Innovation at the Margins episode 2 with Dr. Tawanna Dillahunt, Zachary Rowe, and Joanna Velazquez
    Episode 134 | March 31, 2022 In “Just Tech: Centering Community-Driven Innovation at the Margins,” Senior Principal Researcher Mary L. Gray explores how technology and community intertwine and the role technology can play in supporting community-driven innovation and community-based organizations. Dr. Gray and her team are working to bring computer science, engineering, social science, and […] The post Just Tech: Centering Community-Driven Innovation at the Margins episode 2 with Dr. Tawanna Dillahunt, Zachary Rowe, and Joanna Velazquez appeared first on Microsoft Research.  ( 32 min )
  • Open

    An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April
    In addition to GFN Thursday, it’s National Tater Day. Hooray! To honor the spud-tacular holiday, we’re closing out March with seven new games streaming this week. And a loaded 20+ titles are coming to the GeForce NOW library in April to play — even on a potato PC, thanks to GeForce NOW. Plus, the GeForce Read article > The post An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    I only know one output of a neural net at a time. How to train, if i have two outputs?
    submitted by /u/lullek4 [link] [comments]  ( 2 min )
  • Open

    A Guide to Getting Datasets for Machine Learning in Python
    Compared to other programming exercises, a machine learning project is a blend of code and data. You need both to […] The post A Guide to Getting Datasets for Machine Learning in Python appeared first on Machine Learning Mastery.  ( 12 min )

  • Open

    [R] Making Robots Achieve Tasks Like Animals with Style Transfer + RL
    This work from Berkeley + Google Brain (Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions) describes using RL and GANs for transferring motion styles from animals successfully onto robots. Super neat idea, and love to see ML being used more and more on real robots! Love the Cost of Transport analysis - naturalistic movement really is very efficient, and good luck getting RL to solve the hard exploration problem of good motion and task performance simultaneously tabula rasa! In particular I love this image. Down with hand specified reward functions! Let imitating nature reign supreme. What's next, GANs for moral style transfer? submitted by /u/AristocraticOctopus [link] [comments]  ( 1 min )
    [D] Low MPJPE and low PCK and AUC
    So in the pose estimation landscape (specifically 3D) there are 3 common evaluation metrics MPJPE (mean per joint precision error measured in mm), PCK (percent correct keypoints measured at a 150mm threshold) and then AUC. One oddity I have noticed when training a few models is that a model with a lower MPJPE then some methods does not alwayshave a higher PCK (higher is better for this metric) but the mean joint precsion error is substantially below the threshold used for PCK (I do recognize that this metric is an average). ​ Does anyone have any experience with this, seen the same behavior, or have any intuition why this would be occurring? submitted by /u/AbjectDrink3276 [link] [comments]  ( 1 min )
    [R] A Conversational Paradigm for Program Synthesis
    submitted by /u/Wiskkey [link] [comments]
    [R] Training Compute-Optimal Large Language Models. From the abstract: "We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant."
    submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    [D] Some challenges that must be taken care of while working with ML using medical data
    More than a year ago, I wrote an article regarding some key obstacles that someone may face regarding working with AI in the medical field. A few days ago I submitted that article to "Towards Data Science", a 'Medium' based online publication. It got published yesterday. I am giving the link here. If anyone is interested in that topic, you can take a look. It mainly focuses on the part that - even if you have some previous experience in working with machine learning, there are some things you must know and be aware of before working with medical datasets. Link - https://towardsdatascience.com/some-key-challenges-in-building-an-ai-model-for-medical-diagnosis-63f7438f14a submitted by /u/ishtiakmahmud [link] [comments]  ( 1 min )
    [D] Is quantum ML pointless?
    Today Google had a webinar on Tensorflow Quantum for big data, which I attended. I was surprised that it was almost all quantum computing theory, but there was a link in the talk resources to Tensorflow Quantum where I was told I could find a tutorial with demo code for a classifier system to compare it to my classical approaches -- I use logistic regression, support vector machine, and Tensorflow DNN classifiers; mostly SVM because it works almost as accurately on my job's data sets as DNNs but takes a tiny fraction of the time to train. So, I took a look at it: https://www.tensorflow.org/quantum/tutorials/mnist This was the first sign that quantum classification might not be a viable alternative: An image size of 28x28 is much too large for current quantum computers. -- https://www.tensorflow.org/quantum/tutorials/mnist#12_downscale_the_images You really have to see it to believe it, but this demo requires downscaling legible digits for handwriting recognition to 4-by-4 pixel completely indiscernible blobs! Resulting in, as you might expect, terrible accuracy. A classical model using the full resolution images achieves 99.9%+ accuracy in relatively almost no time to train. So I scrolled down to the "Comparison" section and saw this: a classical model of similar power (~32 parameters) trains to a similar accuracy in a fraction of the time. One way or the other, the classical neural network easily outperforms the quantum neural network. For classical data, it is difficult to beat a classical neural network. The remainder of the tutorial didn't offer any improvement. The "quantum convolutional" NN classifier wasn't any better in speed or accuracy. So, am I correct in assuming that I am best off ignoring quantum computing for classification tasks for the foreseeable future? How long do you think it will be until quantum ML can compete on real-world problems? submitted by /u/Competitive_Travel16 [link] [comments]  ( 4 min )
    [R] CodeGen (up to 16.1B code generating transformer trained on TPU-v4) is open-source
    https://twitter.com/erik_nijkamp/status/1508956485379715072 Paper: https://arxiv.org/abs/2203.13474 Blog: https://blog.salesforceairesearch.com/codegen/ Code: https://github.com/salesforce/CodeGen submitted by /u/lucidraisin [link] [comments]
    [D] Do you know application fields in the sector of machine learning where precise coordination might play a role?
    Self-Synchronizing Oscillators are mainly a hardware-software combination that uses swinging oscillators for decentral synchronization of distributed units without a central steering element. It is a new approach to synchronize two or more entities with another. Instead of relying on a central clock which the other ones communicate with, this technology is mutually or naturally synchronized, so both entities know at any time what the other one is doing. My question would be, are there any possible application fields you could think of for this technology? submitted by /u/mikeseboss [link] [comments]  ( 1 min )
    [D] Predicting and correcting error of a simple model with a lot of data against a much more complex model with less data
    I tried to keep the title somewhat general in case my problem is interesting for others, but before continuing with the discussion I'll introduce some specifics to make it easier to talk about the specific problem. I'm a fluid dynamics engineer, in particular a CFD engineer (fluid simulations and such), working on a phenomenon known as cavitation on hydrofoils. The most common way to perform a full simulation for a cavitating hydrofoil requires approximately 8 hours to run on a 512 cores. I'm currently working on an approximate model that solves the same problem in less than a minute on a common laptop. Of course, as an approximate model it is less faithful than the full model, with the relative error increasing or decreasing as some of the foilparameters change. Namely, a foil is defi…  ( 2 min )
    [D],[P] Guidance required for solving a unique problem in application of DL in CFD
    Hello All, I am currently working on the project wherein I have to apply DL to CFD simulations. The simulation data is basically a set of 2d points (x,y) and their corresponding velocity and pressure values. I have such data for a 100 timesteps. The goal is to predict the flow(i.e. velocity and pressure) for each grid point for the last time step. I was thinking of reshaping the data in the form of a square so that I can use a CNN, but using a CNN wouldn't take care of the time dependence between the data. Is there a hybrid approach I can use that can take care of both temporal and spatial dependencies? ​ Really need some guidance. Even any unrelated advice would be much appreciated. Thank you in advance! ​ Edit: Also needed some help with regards to making the dataset. I have 100 csv files, each file corresponding to one time step. Each file contains the pressure and velocities of around 900 points. How do I make a dataset out of this either in pytorch or tensorflow? submitted by /u/Hour_Amphibian9738 [link] [comments]  ( 2 min )
    [D] How dstack works
    Hi everyone, I’m the creator of dstack, an open-core tool to train models and manage data. I’ve just published a post where I elaborated on the challenges AI researchers face today when training models, and how we at dstack aim to address them. In the post, you may find the details on the design decision we made for our tool. Blog post: https://blog.dstack.ai/p/how-dstack-works Invite everyone to read it, and share their thoughts. Happy to discuss the future of the developer tooling for training models! submitted by /u/cheptsov [link] [comments]  ( 1 min )
    [R] Differentiable Conv Layer using FFT
    This is convolutional layer for torch using fourier transform. I wouldn't be surprised if this already existed somewhere, but I could not find one with derivatives. This is meant to be a drop in replacement for torch.Conv. It should be performant on kernel sizes above 20, depending on implementation. One interesting thing, even if a person already had one of these, is the way the bias and bias gradient were calculated. It only cost O(out_channels), ignoring the data size entirely. github submitted by /u/MKmisfit [link] [comments]  ( 2 min )
    [R] Fully unsupervised multi-domain image-to-image translation
    The code of "A Style-aware Discriminator for Controllable Image Translation" has been released! This is a novel multi-domain image-to-image translation method, which is fully unsupervised, and provide various applications, including style interpolation, content transplantation, and local image translation. Example of the prototype-guided synthesis Example of the reference-guided synthesis Paper: https://arxiv.org/abs/2203.15375 Code: https://github.com/kunheek/style-aware-discriminator submitted by /u/graysp4rrow [link] [comments]  ( 1 min )
    [D] Any well known approaches to compare two sets of neural network weights ?
    Say given an MLP of 2 layers with non-linearity, are there established papers which explore if the sets of weights obtained after 2 trials of training end up with 'similar' weights. From an old stackexchange thread(2017) two possible methods outlined are. 1. Compare similarity on the predictions on validation inputs. 2. Instead of comparing pairwise similarity, simply concat them and use t-sne for dimensionality reduction. Based on a 2009 work. Link: https://cs.stackexchange.com/questions/74488/measuring-difference-between-two-sets-of-neural-network-weights Does anyone know of any recent work which tackles this problem ? submitted by /u/PaganPasta [link] [comments]  ( 2 min )
    [D] Stable Reference for candidate ops in NAS
    Good day, my fellow researchers and engineers. Recently I'm on research about neural architecture search. While reading through tons of papers about neural architecture search, I just got curious about 'how do we predefine primitive operations?' Because most of the paper goes like 'we have these ops in candidates, and we searched through candidates so elegantly...' I mean, how do we know we've predefined our candidates well? Is there any reference to 'good ops candidates'? I know certain cells and ops are often used in certain tasks, but still, I want to find a robust reference about 'The ops candidates for NAS'. It will cure my high blood pressure if you guys give your precious opinions about it. submitted by /u/KindAd9527 [link] [comments]  ( 1 min )
    [R]Introducing causal inference in the energy-efficient building design process
    submitted by /u/Mammoth-Ad-5527 [link] [comments]
    [D]: Why does almost no one study "Weakly Supervised Object Detection"(WSOD) since 2020?
    I notice that there is almost no papers in this area since 2020. And the rank of WSOD hasn't been updated since 2020:https://paperswithcode.com/sota/weakly-supervised-object-detection-on-pascal-1 submitted by /u/voclee4 [link] [comments]  ( 2 min )
    [D] Recursive error prediction
    I had an idea recently for a ML regression strategy and I'm just wondering if something like this already exists. It has similarities with both boosting and bagging, but I think it's ultimately different from both. The basic idea is that you start with a subset of input features and train a model on that subset. Any common model will do as long as it doesn't just spit out the exact target value when making a prediction on the training set (i.e., you couldn't use a 1-neighbor KNN). After fitting the model, you make predictions on the training set (with the same subset of features) and calculate the prediction errors. Then using another subset of features (I would think it should be mutually exclusive from the first subset but maybe it doesn't have to be), you train a separate model, but rather than training on the original Y training data, you use the error of the previous model as the target for the second model. You repeat this process until all features have been used or as many times as desired. To make a prediction, the parent model would simply sum the predictions of the child models. As an additional thought, you might use more regularization, larger leaf size for decision trees, etc. for each additional model. You could also use bagging to create multiple instances of the strategy with different feature subsets in order to create multiple "pathways" through the data. A few questions: Is this sufficiently different from existing boosting/bagging techniques? If yes to #1, are there any existing packages (preferably in Python) that implement this kind of technique? Could this be used to reduce overfitting for higher dimensional datasets? If so, would additional steps need to be taken (e.g., like the iterative regularization scheme I mentioned)? My thought is that it's a kind of divide-and-conquer strategy where each subsequent model is asked to do a little less than the previous model. Any thoughts are appreciated. submitted by /u/JHogg11 [link] [comments]  ( 2 min )
    [R] STaR: Bootstrapping Reasoning With Reasoning
    submitted by /u/hardmaru [link] [comments]
    [R] Pathways: Asynchronous Distributed Dataflow for Machine Learning
    submitted by /u/hardmaru [link] [comments]
    [N] TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
    It provides: A repository of modular and composable building blocks (models, fusion layers, loss functions, datasets and utilities). A repository of examples that show how to combine these building blocks with components and common infrastructure from across the PyTorch Ecosystem to replicate state-of-the-art models published in the literature. These examples should serve as baselines for ongoing research in the field, as well as a starting point for future work. https://github.com/facebookresearch/multimodal submitted by /u/gurkitier [link] [comments]  ( 1 min )
    [R] AI Simulators for Assisted Living (from Facebook, CMU, et al)
    submitted by /u/aidev2040 [link] [comments]
    [R] Desiderata for Representation Learning: A Causal Perspective
    Authors: Yixin Wang, Michael I. Jordan Abstract: Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be non-spurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets. Paper: https://arxiv.org/abs/2109.03795 Slides: https://yixinwang.github.io/papers/causal-rep-slides-public.pdf submitted by /u/bikeskata [link] [comments]  ( 1 min )
  • Open

    Will Smith's AI Persona was asked about his slapping performance on Oscar Stage
    submitted by /u/kuasha7 [link] [comments]
    Researchers from MIT CSAIL Introduce ‘Privid’: an AI Tool, Build on Differential Privacy, to Guarantee Privacy in Video Footage from Surveillance Cameras
    Surveillance cameras have an identity crisis exacerbated by a conflict between function and privacy. Machine learning techniques have automated video content analysis on a vast scale as these sophisticated small sensors have shown up seemingly everywhere. Still, with increased mass monitoring, there are currently no legally enforceable standards to curb privacy invasions. Security cameras have evolved into wiser and more capable tools than the grainy images of the past, which were frequently used as the “hero tool” in crime dramas. Video surveillance can now assist health regulators in determining the percentage of persons using masks, transportation departments in monitoring the density and flow of automobiles, cyclists and walkers, and businesses in gaining a better understanding of buying habits. But why has privacy remained a second-class citizen? Privid Currently, the footage is retrofitted with blurred faces or black boxes. This prevents analysts from asking some legitimate questions (for example, are people wearing masks? ). Dissatisfied with the present status quo, MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) developed a system with other institutions to better guarantee privacy in surveillance video footage. The system, dubbed “Privid,” allows analysts to input video data searches and then adds a tiny amount of noise (additional data) to the result to ensure that no one can be identified. The method is based on a formal notion of privacy known as “differential privacy,” which permits without having access to aggregate statistics about private data disclosing individually identifying information. Continue Reading Paper: https://arxiv.org/pdf/2106.12083.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    This guy used AI to create a voice model of Sam O'Nella and made a video in his style.
    submitted by /u/KirbyBWCH [link] [comments]
    Image Classification With Vision Transformers in a Gradio Web App
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Meta AI Team Open-Sources Mephisto: A New Platform For Open And Collaborative Way Of Collecting Data To Train ML Models
    Training datasets are very important for experimenting with varied data to train new AI models. However, many commonly used public data sets contain labeling errors. This makes it challenging to train robust models, particularly for novel tasks. Many researchers use techniques such as employing a variety of data quality control procedures to overcome these shortcomings. However, there is no centralized repository consisting of examples of using these strategies. Meta AI researchers have recently released Mephisto. It is a new platform to collect, share, and iterate on the most promising approaches to collecting training datasets for AI models. Researchers can exchange unique collecting strategies with Mephisto in a reusable and iterable format. It also allows them to change out components and quickly locate the exact annotations required, minimizing the barrier to custom task creation. Continue Reading Github: https://github.com/facebookresearch/Mephisto Documentation: https://mephisto.ai/ https://preview.redd.it/mae7igz11kq81.png?width=1920&format=png&auto=webp&s=764145c4350d73049ae49faafd43ac3806712a2d submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Quantum Computing Memristor To Unlock AI
    submitted by /u/getrich_or_diemining [link] [comments]
    NEW EXCITING STUDY: GRIEVERS AND CHATBOTS
    submitted by /u/annaksig [link] [comments]
    AI that can expand the borders of video?
    Hi do any of you know the name of a AI that can expand the borders of a video By making the AI guess what would be around the square of the video? I remember I once saw something like this in a video on two minute papers youtube channel Where they used an example with a eagle flying in the sky But I haven't been able to find the video since And I have been looking for a long time But having such AI will be very help full to video editing submitted by /u/Pwichmann [link] [comments]  ( 1 min )
    Best Data Visualization books for Data Science to read in 2022
    submitted by /u/sivasiriyapureddy [link] [comments]
    Explaining overfitting and why 100% accuracy is not a guarantee to clients
    What are your approaches to explaining these topics to business people? submitted by /u/RubiksCodeNMZ [link] [comments]  ( 1 min )
    What metric do you use for hyperparameter tuning?
    Pretty much as the title says. I work in research with electroencephalography (EEG) classification (those electrodes they stick on peoples heads). EEG is notoriously noisy and prone to overfitting, I have generally used validation accuracy as an optimization metric for a Bayesian HP tuning approach, but I find this tends to result in fairly unreliable models, even using cross validation approaches. These models are really noisy and while they may reach pretty good accuracy, that is a single epoch where it got a decent spike. I was wondering if there are common resolutions for this that I have missed, or if anyone has had luck with a custom metric that takes into account not just the validation accuracy, but the consistency and the difference between validation and training accuracy to better account for overfitting. Thanks! submitted by /u/Ozzod [link] [comments]  ( 1 min )
    Google Docs Now Auto-Generate Short Summaries Using Machine Learning
    Many of us find it difficult to keep up with the daily flood of documents in our inboxes. These could be reports, reviews, briefs, policies, etc. Nowadays, readers wish to have a concise summary including major elements of their document, helping them prioritize their work efficiently. However, writing a document summary from scratch manually is a time-consuming task. To aid document writers in writing content summaries, Google announced a new feature enabling Google Docs to generate ideas automatically when they are available. The team employs a machine learning (ML) model to understand document text and provide a one- to two-sentence natural language description of the material. On the other hand, the document writer retains complete control, choosing whether to accept the proposal as-is, make necessary adjustments to better capture the document summary, or ignore it entirely. This section, combined with the outline, can help readers understand and navigate the work at a high level. While anybody can contribute summaries, only Google Workspace business customers have access to auto-generated ideas. Continue Reading https://i.redd.it/pcrn8rqmxgq81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there an AI for psychological testing?
    I want to test myself using a neural net in lieu of psychological testing, and was wondering if that's even publicly available. submitted by /u/AdvancedRazzmatazz46 [link] [comments]
    AI beats 8 world champions at bridge
    submitted by /u/nick7566 [link] [comments]
  • Open

    Why isn't epsilon reset regularly in epsilon greedy policies to aid exploration?
    surely this improves exploration? e.g. I took a default dqn and after 200k frames mountain car env didn't get to the top but after modifying training to reset epsilon to eps max every 3k steps (with a 1000 frame decay rate from 1 to 0.1) helped increase the score during training and got -133. after testing a bit I found it helps training if you modify how many N steps you reset epsilon as you train, e.g. I start on 3k for 100k frames and then 10k for 100k frames and then no resetting. I can't be the first and I understand there are much more mathematically stronger exploration techniques but is this a poorman's exploration technique? submitted by /u/clockface99 [link] [comments]  ( 2 min )
    Cooperative multi-agent with a global reward function
    In my environment, I have multiple agents that need to cooperate. The reward function is global, such that it depends on the overall state of the system, and not just the sum of each agent reward. Could you point me to some relevant literature in this field? submitted by /u/fedetask [link] [comments]  ( 1 min )
    11 Best Python Books for beginners to advanced to read in 2022 -
    submitted by /u/sivasiriyapureddy [link] [comments]
    What is tau in the Dyna-Q+ algorithm?
    https://imgur.com/UOdDUFH From the linked image I am wondering what tau is (the tau looks like a small r in the image unless you zoom in)? Is it a hard coded value like kappa (k)? If not how is the value for tau determined when Dyna Q+ runs? submitted by /u/lifelifebalance [link] [comments]  ( 1 min )
    Worthwhile to convert custom env to be dm_env compatible?
    Can anyone speak to their experience using acme (https://github.com/deepmind/acme) and by extension dm_env (https://github.com/deepmind/dm_env)? I'm wondering if it would be worthwhile for me to invest the time into converting my custom environment (which loosely follows the standard RL setup) over to this format. I quite like how acme does a lot of heavy lifting in the background and lays out their thoughts on best practices, but perhaps I'm being shortsighted by all the bells and whistles submitted by /u/whynotmehmm [link] [comments]  ( 1 min )
  • Open

    How is portable AM radio possible?
    The length of antenna you need to receive a radio signal is proportional to the signal’s wavelength, typically 1/2 or 1/4 of the wavelength. Cell phones operate at gigahertz frequencies, and so the antennas are small enough to hide inside the phone. But AM radio stations operate at much lower frequencies. For example, there’s a […] How is portable AM radio possible? first appeared on John D. Cook.  ( 4 min )
    Applications of continued fractions
    At first glance, continued fractions look more like a curiosity than like useful mathematics. And yet they come up surprisingly often in applications. For an irrational number x, the numbers you get by truncating the infinite continued fraction for x are the optimal rational approximations to x given the size of their denominators. For example, […] Applications of continued fractions first appeared on John D. Cook.  ( 2 min )
  • Open

    Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans
    Four words: smart, sustainable, Super Bowl. Polestar’s commercial during the big game made it clear no-compromise electric vehicles are now mainstream. Polestar Chief Operating Officer Dennis Nobelius sees driving enjoyment and autonomous-driving capabilities complementing one another in sustainable vehicles that keep driving — and the driver — front and center. NVIDIA’s Katie Washabaugh spoke with Read article > The post Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    The Increasing Importance of Master Data Management for Your Business: A Primer
    By Rex Ahlstrom, CTO & EVP Growth & Innovation, Syniti   The modern enterprise is composed of a variety of systems, each of which holds data the company needs to conduct business: information about products, services, suppliers, customers, and more. This is the master data, and master data collected by these disparate systems is often stored… Read More »The Increasing Importance of Master Data Management for Your Business: A Primer The post The Increasing Importance of Master Data Management for Your Business: A Primer appeared first on Data Science Central.  ( 5 min )
    How ECash Will Change The Economy
    The Biden Administration made a recent announcement that it was setting up an exploratory committee for the creation of an e-Currency taskforce. In conjunction with this, a new bill, the Electronic Currency and Secure Hardware (ECASH) Act, was introduced by Rep. Stephen Lynch (MA-08), Chair of the House Committee on Financial Services’ Task Force on… Read More »How ECash Will Change The Economy The post How ECash Will Change The Economy appeared first on Data Science Central.  ( 4 min )
    Transitions and the Arc of Systems Theory
    DSC Weekly Digest 29 March 2022 Back in September, I made a prediction: Covid-19 would spike throughout the winter but fade by April as it transitioned from being a pandemic virus to an endemic one. As it turns out, I was mostly correct. Here in Washington State, we finally dropped the mask mandate that had… Read More »Transitions and the Arc of Systems Theory The post Transitions and the Arc of Systems Theory appeared first on Data Science Central.  ( 6 min )

  • Open

    [D] What would a "Production" RL stack look like in terms of tooling?
    I was hoping I could get some insight into the tooling that you use (or would use) for some production RL work? I'm mostly doing it all on my home machine as a fun side project. I've got the following existing infrastructure: - An interface based loosely on the standard RL setup. I'm thinking about adapting it to fit Acme to let it do more heavy lifting since I quite like `Haiku`, `rlax` and the rest of what they do. - I've got some models across languages (Pytorch and Jax) and this has been causing me some headache trying to make sure everything is abstract enough. Should I just stick to one language and make sure all my friends just use that same language? - I'm currently using comet-ml for my experiment tracking, and for the most part I like it. However, I'm now looking around to see what's out there and I'm a little overwhelmed by (1) how many tools there are and (2) how some of them seem to "overlap" so I don't really know how to compose them. - configs all stored in a python file in a separate repo that is kept synced between my other repos. - I currently store my agent experiences (off policy) in a database that I later query to rapidly fill up the replay buffer. The limitation is that this is for a single agent. What drew me to Acme is that it seems to allow multiple agents to all use the same buffer? _____________ tl;dr 1) Has anyone used Acme? I'm thinking of moving my project to it, but it might end up being a lot of effort for very little reward 2) How do you and your teams handle multiple languages? Do you just have abstract gym wrappers that convert data? 3) What tools do you use and how do you compose them together? I'm so so lost trying to navigate this space 4) How do you keep your configs synced when they are used between repos? submitted by /u/whynotmehmm [link] [comments]  ( 1 min )
    [P] I have data with connections and links but I don't know how to write a scrip for this. Help!
    My dates are as follows: ​ https://preview.redd.it/9htkypafgeq81.png?width=198&format=png&auto=webp&s=35e5475cf71364b8958b34e6100bd0ada2dc756d What I would like is to be able to map the following to a script: - Value 1440/1 in column FROM represented value 144019/1 in column TO. - Find value 144019/1 again in column FROM. - If found, take the value in column TO and find it again in column FROM. Not found, stop searching. ​ Note: value 1440/1 does not have to be the initial value. In my data, 1440/1 can refer to another value again that is from column TO. ​ I would like the following as output: - 1440/1, 144019/1, 144019/2; - 1440/1, 144018/1, 144018/2, 6038/1. submitted by /u/Silver-Panda2518 [link] [comments]  ( 1 min )
    [D] Quantization Aware Training Advice?
    Hi, I'm trying to run QAT on a MobileNetV2 based model but having some issues hitting the same training losses in the QAT phase as I did in the not-QAT phase. As a test, I'd trained the network for 1 epoch, then trained the QAT phase for 5 epochs and managed to get the same loss (actually lower). However, after training the model (non-QAT part) for 150 epochs, the QAT phase is really struggling to get down to the same loss. In my first test it dropped then completely levelled off for 2-3 epochs then nosedived again for another epoch, I'm not seeing the same in this longer train though. I was wondering if anyone has any advice on things like, should the learning rate be reset at the start of the QAT phase or should it carry on from where the training left off? I'm using Adam as the optimiser in the first phase, is that still ok in the second phase. Any other things that I could try? I did read a paper on improving quantization loss in MobileNet by L2 weighting the separable conv weights and swapping out ReLU6 for ReLU but I wasn't really seeing the same benefit as the paper (https://arxiv.org/pdf/1803.08607.pdf) did in my tests, I was getting a worse initial network. Thanks for any insight that anyone can provide! submitted by /u/ColdChancer [link] [comments]  ( 1 min )
    [P] Interactive Demo for Paper SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches
    ​ https://reddit.com/link/trno8g/video/84uuyln8ceq81/player Hi everyone, here's an interactive demo I made for paper SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches Demo: http://47.57.135.203:8001/ Paper: https://arxiv.org/abs/2111.15078 Project page: https://zengxianyu.github.io/sketchedit/ Code: https://github.com/zengxianyu/sketchedit submitted by /u/Educational_Ebb2502 [link] [comments]  ( 1 min )
    [D] DailyML quiz: A very high variance means the model likely has…
    Yesterday's answer: Pandas View Poll submitted by /u/daichrony [link] [comments]
    [P][N] CompilerGym Tutorial @ CGO
    This weekend we (Hugh, Mostafa, and Chris from Meta AI) will be running a tutorial on Autotuning and Reinforcement Learning for compilers using CompilerGym at CGO’22. Join us for a hands-on session that takes you from “zero to RL” in three hours! The tutorial stats 1:30pm ET on Saturday April 2nd. Full schedule: https://conf.researchr.org/program/cgo-2022/program-cgo-2022/?date=Sat%202%20Apr%202022 submitted by /u/melhoushi [link] [comments]
    [R]Looking for papers with Date Inference from text
    Hey all, I'm going through research to figure out how the Date Inference protocols might be implemented. Given a text containing the phrase "5 days from now", I would need it to infer the date April 3rd. The 'now' part is a trivial problem, but the inference is something I'm struggling with. I could use regex but all the possible edge cases are tricky. The inference would need to work on unstructured cases like "April 23rd", "Next Sunday", etc. It would need to work forwards and backward (5 days ago etc.) ​ Any great papers/resources? I searched for date inference, but nothing similar to what I'm looking for submitted by /u/ISeeThings404 [link] [comments]  ( 1 min )
    [Research] AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
    Paper: https://arxiv.org/abs/2203.13448 Code: https://github.com/lijuncheng16/AudioTaggingDoneRight For anyone who's interested in AudioSet (2million youtube videos' sound). This is the SOTA comparison of models and training procedures. submitted by /u/billyli_16 [link] [comments]
    [P] College course on ML with an object detection project and competition between teams within the class. Help me make the best model possible.
    The objective is simple, a kit with some small hardware is given to us (nuts, bolts, washers, etc). Using our laptop cameras, we need to develop a model that is able to accurate classify what object is what when placed infront of the camera. There can be any number of objects in any orientation, displayed on any color surface. What is the best way to approach this problem, what is a good model structure (high level) and what can I do to be a step above the competition. submitted by /u/Certified_User [link] [comments]  ( 1 min )
    [D] There is no time to read the textbook as a researcher
    I'm a researcher who is deeply interested in deep generative models. There are excellent textbooks I want to read if time allows, such as: Probabilistic ML: https://probml.github.io/pml-book/book1.html PRML: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf However, there are also many papers I have to read, new theories I have to learn, and works I need to finish. The problem is that reading textbooks could deepen the fundamental understanding of the field but rarely gives an immediate reward. Practically, reading a textbook from start to end can take > 1000 hours, in which one can read more than a hundred papers. Given the situation, I have studied basic stuff only when I need them for my research (you know, publish or perish). ​ However, I think the time to read textbooks will decrease rather than increase, and only junior researchers will be able to afford to read them. It means that if I don't read them now, I won't be able to read later. Is there any general advice on this? submitted by /u/SnooPandas3529 [link] [comments]  ( 6 min )
    [P] Will a recommender system alone solve this issue?
    I have a project featuring 5+ years of data detailing mechanic reports. Essentially, I want to use ML to build a model that can make suggested actions to fix an issue based on these mechanic reports. For example, if a user typed in that a car “makes a squeaky sound” then it suggests three courses of action that may fix the issue based on similar issues and solutions detailed in the mechanic reports. Furthermore, when returning these suggestions, I want the user to see some sort of score indicating how likely it is to fix the issue (i.e. option A worked 97% of the time, option B 2% of the time, and option C 1% of the time). I also want the user to be able to try the options and give feedback on if they fixed the issue. My brain immediately went to a recommender system, but I don’t have much experience with creating them. Can they do all of the above (recommend solutions, score solutions, and allow for user input to keep training the model) or do I need to somehow pair with another method/model? I’m just not sure where to start. submitted by /u/ambiguousalmond [link] [comments]  ( 1 min )
    [R] Understanding Dimensional Collapse in Contrastive Self-supervised Learning
    submitted by /u/fasttosmile [link] [comments]
    Avoid vver fitting in iterative pruning [D]
    Avoid over fitting in iterative pruning For iterative pruning algorithms referring to research papers like: ​ Learning both Weights and Connections for Efficient Neural Networks Deep Compression Comparing Rewinding and Fine-Tuning in Neural Network Pruning I have found that during these pruning rounds, the pruned sub-network starts to overfit excessively, with training accuracy approaching almost 100%. This can be attributed to the fact that the surviving trained parameters are not reinitialized to either the randomly initialized values or to a previous value from earlier in the training. Whereas, for "The Lottery Ticket Hypothesis" and it's family of related research papers such as: The Lottery Ticket Hypothesis Stabilizing the Lottery Ticket Hypothesis One ticket to win them all Deconstructing Lottery Tickets such overfitting is usually not observed due to the weight rewinding scheme. Since, the original & unpruned deep learning architecture is already trained with strategies such as: data augmentation, weight decay, learning rate schedule, etc., the resulting iterative pruning rounds result in overfitting. Can you suggest a way to avoid these overfitting during these iterative pruning rounds? submitted by /u/grid_world [link] [comments]  ( 2 min )
    Signapse: Harnessing CNNs to Teach Sign Langauge [Project] [Discussion]
    Hi, I'm a student at the University of Glasgow building a linux app that is trying to use CNNs to teach people the ASL (american sign language) alphabet. We just released the first version of our software which (although admittedly buggy) is worth sharing with interested communities. In brief, a MobileNetv2 model is trained on kaggle data for each sign in the ASL alphabet, this is executed within the OpenCV framework and run on camera frames of the user in real time. The user is challenged to make different signs and rewarded when the correct sign is made. We would love for interested people to try out our software and let us know about enhancement ideas or any bugs they may find. If you are interested in the project, please head over to our GitHub to have a look: https://github.com/albanjoseph/Signapse You can also follow us on Facebook: https://www.facebook.com/Signapse-125793226671815 Twitter: https://twitter.com/GU_Signapse and YouTube: https://www.youtube.com/channel/UCh2uG2pYoSloEU0IFeqDQMA Cheers! Signapse Team submitted by /u/rossythebossy [link] [comments]  ( 1 min )
    [D]Hugging Face Model Comparator Space Builder
    You can now build a space comparing Hugging Face Models and Spaces or create clones of them with Model Comparator Space Builder 📷📷📷 https://huggingface.co/spaces/farukozderim/Model-Comparator-Space-Builder ​ https://preview.redd.it/t40n9amr0cq81.png?width=1813&format=png&auto=webp&s=f151ea9060f7c5b43f8dbcf3f91d1c308bdbb422 ​ https://preview.redd.it/fx1ztghs0cq81.png?width=1848&format=png&auto=webp&s=f1cfcb2b47831b10744b4d0e114fd21ed725b195 Gradio: https://github.com/gradio-app/gradio Hugging Face: https://huggingface.co/ submitted by /u/Mundane-Apartment224 [link] [comments]
    [P] scikit-learn transformer that turns categorical variables into dense vector representations
    Hi everyone. Our DS & DA team open sourced a Python library that helps in dealing with categorical variables for machine learning algorithms. It leverages Tensorflow/Keras embedding layers and builds a neural network that learns a dense representation of each unique class. This is all packaged inside a regular scikit-learn transformer that can be used within pipelines and can have its hyperparameters optimized with regular sklearn methods. Just do pip install embedding-encoder[tf]. Check out the readme at Github or the blog post for examples. Github: https://github.com/cpa-analytics/embedding-encoder PyPI: https://pypi.org/project/embedding-encoder/ Blog post: https://cpa-analytics.github.io/embedding-encoder-intro/ This was inspired by the 3rd place solution in the Rossmann Store Sales Kaggle competition. Some implementations have surfaced over the years but we are not aware of working one that integrates well with existing libraries. This is just another preprocessing technique. It can be optimal for your task or not. As always, try multiple approaches and evaluate the results! submitted by /u/rafa10pj [link] [comments]  ( 2 min )
    [Research] Dealing with variable length input data
    I am building a ML model to classify malware. My input data are windows binaries which get de-compiled into functions. From these functions I create embeddings, each is 150 float numbers. https://preview.redd.it/4ejwz7ukubq81.png?width=581&format=png&auto=webp&s=cbde0b534a8424f2a17ea5f1c77fccd2860685f5 Problem: Each binary has a variable amount of functions. Some may have 40 functions while others may have over 1000. Most will have between 50 and 200. The order of the functions is not important. Question: What is the best way to deal with these variable amount of input. Hashing trick? Or Deep sets? What would you recommend? submitted by /u/laddi27 [link] [comments]  ( 1 min )
    [Discussion] Podcast with Jonathan Frankle of MosaicML
    Jonathan came on the Weaviate Podcast to discuss the story of MosaicML, their new open-source Python library for Efficient Deep Learning called Composer, Pareto Curves of Training Time X Accuracy, Model Surgey augmentations, Maximizing CPU and GPU throughput, and many more! I hope you find this useful, happy to continue discussions of what Jonathan presented! https://www.youtube.com/watch?v=ZiBkspwrICA submitted by /u/HenryAILabs [link] [comments]  ( 1 min )
    [D] Using large language models for classification of natural-language input
    Hey everyone! I'd like to use a large language model like T0pp or GPT-NeoX-20B to take a natural-language input from a user and map it to one of ~2000 possible VS Code command palette commands. Essentially, this is a classification problem of the form "NLP input -> command". The idea is to let users give voice input in natural language and then have the model figure out what command they most likely want to activate. Given the number of possible commands I clearly can't rely on prompt design to solve this. It might be a good fit for a model with explicit retrieval augmentation like a memorizing transformer. But that's still a very active area of research without high-quality pre-trained models. Given that, I'm thinking that doing some kind of fine tuning to an existing model is the best bet. But it's unclear to me what the training data should look like... should I just generate a few examples of each command of the form input: "vscode command: 'open new file'", output: "explorer.newFile", and then fine-tune on those? Is there some way to ensure that the model understands that I *always* want it to return one of the commands provided in fine-tuning, instead of arbitrary text? Interested in others' experiences with similar tasks! Background: I'm working on an open source VS Code extension called Clippy AI. Currently it only performs code modifications to the current file and is a thin wrapper around the OpenAI API. But I'd like to use it to automate other editor actions as well! submitted by /u/corbt [link] [comments]  ( 2 min )
    Visualizing Pathologies in Ultrasound Images Using OpenCV and Streamlit [P]
    As part of our AI Challenge with a health-tech startup: https://omdena.com/blog/pathology-streamlit/ https://preview.redd.it/zew2baykgbq81.png?width=640&format=png&auto=webp&s=e66180724c08f22db3d322d8b1fd6f56e8765a3c submitted by /u/Lordobba [link] [comments]
    [D] It is possible to build a time series model with this dataset.
    So basically, I started my internship in business intelligence, and when my boss know that I have a background in machine learning and deep learning. so, he asked me to build a model that predicts a specific number for the next month. so, it time series problem, and the datasets that I have it is very small it starts from May 2019 so it is just 31 rows. And when I plotted the data, it had no clear trend. This pitcher for the graph looks like my dataset here! dataset (sorry I cannot share the dataset because of privacy). So, I started to take the difference in the data to remove the seasonality and log transformation and after that, I bullied the model using the Arima algorithm and LSTM, and prophet. And I applied a prediction interval for the predicted number to get periods and expect the number will be inside this interval. But unfortunately, the actual number (for this month) was out of the interval. So, I decided to look back in a database and I found a feature I think that may help and have a high correlation with the main feature now becoming a multivariate time series problem. so, I tried to use the VAR algorithm but unfortunately, the model also filed and the actual number for each feature was out of interval. This first time for me to build a time series model in the industry for a real dataset and I worked alone. So, there is an approach that can help me to build a better model that I do not follow in my step. Or I should go to my boss tell him cannot build a model for this dataset, especially the data is impacted by a coronavirus. submitted by /u/xxsalehxx140 [link] [comments]  ( 2 min )
    Process Models for Data Science? (academic) [R]
    Hello everyone, I am a german student and currently writing my masters thesis. It is a rather simple ML task but for my thesis I need to describe the methodology and which process models are used and I am new to this. I found CRISP-DM and its successor ASUM-DM. However, I know that sounds stupid, I am not able to find information on these or useful pdfs. Like the general information and descriptions are accessable but I need an officisl source that I can cite. IBM itself has a link: ftp: //ftp.software.ibm.com/software/data/ sw-library/services/ASUM.pdf However, its not working for me as I have no ftp access. So my question is, does anyone have a link where I can find relevant official information to these models and furthermore are there any other standards or process models to describe the approach of working with data that are rather used in the industry. submitted by /u/terektus [link] [comments]  ( 1 min )
    [D] Object Inputs with Multiple Features
    Hello, I am looking at having a neural network take inputs of 8 objects that all have different features/attributes and then an input of the different features/attributes of the environment the objects are in. The output of this would be the rank of each object. The attached image demonstrates a diagram of the neural network. The objects actively interact/compete with each other. I thought about inputting a single object's features and the environment into a neural network with the output as a performance score of the object. However, the object does worse or better depending on what other objects it is competing against. The objects also get better or worse over time, so it may be good to backpropagate and analyze the objects also as a time series. Is it possible to input the object/object features as a matrix? I have not figured out a way to group this data. I was thinking maybe a convolution neural network may work. I am somewhat new to the machine learning world. Any recommendation or help would be great. Thank you https://preview.redd.it/ng7q9haxw7q81.jpg?width=6450&format=pjpg&auto=webp&s=9393006f00583a1a4a9af02044e726559176c403 submitted by /u/hypercar_junkie [link] [comments]  ( 1 min )
    [R] time series clustering resources
    Hi, I am intending to write a paper (almost 25-30 pages) about time series clustering. I have done my online research, however, I ll be grateful if you can mention some other resources that might be of interest, either theoretical or applied. It can be blogs about machine learning you find interesting in this area, video series, lectures, lecture notes, whatever. Thank you very much. submitted by /u/jiii95 [link] [comments]  ( 1 min )
  • Open

    Flower Team Releases Flower 0.18 With Cool New Updates For Federated Learning
    Flower is an end-to-end federated learning framework that allows for a smoother transition from simulation-based experimental research to system research on many real-world edge devices. Flower has individual strengths in both domains (i.e., simulation and real-world devices) and the capacity to switch back and forth between the two extremes as needed throughout exploration and development. Researchers present use cases that drive our viewpoint, design goals, the resultant framework architecture, and comparisons to other frameworks in this part. Federated Learning (FL) has shown to be a viable option for enabling edge devices to develop a shared prediction model cooperatively while maintaining their training data on the device, divorcing the capacity to execute machine learning from the requirement to store data in the cloud. However, FL is challenging to implement practically in size and system heterogeneity. Although there are several research frameworks for simulating FL algorithms, none of them facilitate the investigation of scalable FL workloads on heterogeneous edge devices. Flower 0.18 released Thanks to a longer gap than usual, the latest Flower release has more upgrades than any previous release. Also, thanks to the wonderful community for your continuing support and generosity. Continue Reading Paper: https://arxiv.org/pdf/2007.14390.pdf Github: https://github.com/adap/flower https://preview.redd.it/ywtttqlnceq81.png?width=1920&format=png&auto=webp&s=893fbe79c190aa66e293b296e35d4096eb178f97 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Carbon vs Silicon molecule
    Since we are about to create a silicon-based lifeform I was looking up the differences between the carbon and the silicon molecule. https://www.differencebetween.com/difference-between-silicon-and-vs-carbon/#:~:text=The%20key%20difference%20between%20silicon,in%20the%20outer%20energy%20level. submitted by /u/asenz [link] [comments]
    Last Week in AI: Super fast 3D perception from Nvidia, Ukraine uses face recognition to identify dead Russian soldiers, US-China AI collaboration drops, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Google Maps Utilizes Machine Learning To Block Nearly 100 Million Fraudulent Edits
    In their recent post on how Google keeps Maps information reliable, the company elaborates how they use machine learning and human operators to block nearly 100 million attempted fraudulent edits to Google Business Profiles. Machine learning, in simple terms, is a sort of artificial intelligence (AI) that lets software applications improve their accuracy at predicting events without having to be explicitly programmed to do so. Machine learning algorithms use past data as input to forecast new output values. The world changed with the introduction of vaccinations, revisions to mask regulations, and new COVID variations in 2021. Accordingly, their Maps community updated Google Maps with further information about their nearby areas. Their contributions have helped Google provide updated company information, such as a location’s hours of operation or its health and safety regulations, for 30% more firms in 2021 than 2020. Quick Read submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    TinyML Gearbox Fault Prediction on a $4 MCU
    Is it possible to make an AI-driven system that predicts gearbox failure on a simple $4 MCU? How to automatically build a compact model that does not require any additional compression? Can a non-data scientist implement such projects successfully? I will answer all these questions in my new project. In industry (e.g., wind power, automotive), gearboxes often operate under random speed variations. A condition monitoring system is expected to detect faults, broken tooth conditions and assess their severity using vibration signals collected under different speed profiles. Modern cars have hundreds of thousands of details and systems where it is necessary to predict breakdowns, control the state of temperature, pressure, etc.As such, in the automotive industry, it is critically important t…  ( 5 min )
    Metaverse is considered the future of internet. It is used to designate a universe beyond physical world. Watch our video to know more about it.
    submitted by /u/Nitorblog [link] [comments]  ( 1 min )
    Building Decision Trees - Entropy, Information Gain & Gini Impurity
    submitted by /u/TheNerdyDevYT [link] [comments]
    China researches “brain-scale” AI
    https://mixed-news.com/en/artificial-intelligence-china-researches-brain-scale-ai/ From the article: In China, the state and companies are researching AI models with trillions of parameters. They want to prove that they can develop “brain-scale” AI. ... In a new paper, researchers from Tsinghua University, Alibaba Group, Zhejiang Lab and Beijing Academy of Artificial Intelligence present BaGuaLu, a framework that enables the training of large AI models using the Mixture-of-Experts (MoE) architecture. In an initial test, the researchers trained a 1.93 trillion model with their framework, outperforming Google’s Switch Transformer. They also demonstrate that their framework enables models with 14.5 trillion and a full 174 trillion parameters. ... The team expects that giant multimodal AI models could have far-reaching implications for numerous AI applications. submitted by /u/Sephirio [link] [comments]  ( 1 min )
    9+ Best Deep Learning books for beginners to Expert 2022 [Updated] -
    submitted by /u/sivasiriyapureddy [link] [comments]
    JAX + Flower For Federated Learning Gives Machine Learning Researchers The Flexibility To Use The Deep Learning Framework For Their Projects
    Google researchers created JAX to conduct NumPy computations on GPUs and TPUs. DeepMind uses it to help and expedite its research, and it is increasingly gaining popularity. Differentiation with grad(), vectorization with map(), and JIT-compilation (just-in-time) with jit are some of the composable functions required for machine learning research in JAX (). As a result, adding a JAX-based workload to the Flower code samples is a must-have. The combination of JAX and Flower allows ML and FL researchers to employ the deep learning framework that their projects demand. The updated code example now serves as a template for migrating existing JAX projects to a federated environment. It’s pretty simple to put up a centralized machine learning architecture, and the JAX developer documentation has multiple examples. Because the ML model parameters are stored in the DeviceArray data format, setting up the federated workload requires some knowledge of JAX. To be compatible with the Flower NumPyClient, those arguments must be converted to NumPy ndarrays. The JAX meets Flower example below demonstrates how a Flower setup might work. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Discussions on Pattern Theory
    Hi all, I have been interested in a field called pattern theory for some time now. Its a mathematical formalism to describe patterns in the world, as well as a framework for developing ai. As far as I can tell, pattern theory seems to be somewhat of a dead field. I'm not sure who is possibly still thinking about it other than a single digit number of academics (e.g. David Mumford). I find this a bit unfortunate, since while I'm still a bit naïve about the field, I'd like to find people I could talk to about it. Does anyone have any recommendations for where I could find people I could talk to about pattern theory/bounce ideas off of relating to pattern theory? Thanks in advance submitted by /u/patterntheoryacc [link] [comments]  ( 1 min )
  • Open

    AI podcast: from neuroscience to deep learning
    submitted by /u/aidev2040 [link] [comments]
    neural networks project
    Hi, hope everyone is doing will ​ ​ i am an EE student, i am currently taking a class (my first) in machine learning, also i am enjoying it so much, now we have reached the point where we are studying neural networks.. back propgation, CNN, RNN, Autoencoders, Deep Learning etc.. ​ and my prof wants us to get working with some projects (using neural network) ​ basically he wants us to get some already written code in Gethub (((written with Pytorch))) and understand it, understand the task, the motivation, the structure and the math, and modify it if needed, and implement it, and this will be the project, no one starts from scratch ​ for example: image classification object detection ​ also he said that the code that identify numbers is simple for a project, it will be an assignment. ​ so any ideas? or links or any advice in general ​ thanks guys and all the best submitted by /u/Torvaldz_ [link] [comments]  ( 1 min )
  • Open

    Interactive ML Strategy course with Foster Provost starting April 7
    Sponsored Post Building successful machine learning products requires mastering ML Strategy, including problem formulation, evaluation, and tactics for dealing with […] The post Interactive ML Strategy course with Foster Provost starting April 7 appeared first on Machine Learning Mastery.  ( 2 min )
    A Guide to Obtaining Time Series Datasets in Python
    Datasets from real-world scenarios are important for building and testing machine learning models. You may just want to have some […] The post A Guide to Obtaining Time Series Datasets in Python appeared first on Machine Learning Mastery.  ( 14 min )
  • Open

    Personalize cross-channel customer experiences with Amazon SageMaker, Amazon Personalize, and Twilio Segment
    Today, customers interact with brands over an increasingly large digital and offline footprint, generating a wealth of interaction data known as behavioral data. As a result, marketers and customer experience teams must work with multiple overlapping tools to engage and target those customers across touchpoints. This increases complexity, creates multiple views of each customer, and […]  ( 10 min )
    Automated, scalable, and cost-effective ML on AWS: Detecting invasive Australian tree ferns in Hawaiian forests
    This is blog post is co-written by Theresa Cabrera Menard, an Applied Scientist/Geographic Information Systems Specialist at The Nature Conservancy (TNC) in Hawaii. In recent years, Amazon and AWS have developed a series of sustainability initiatives with the overall goal of helping preserve the natural environment. As part of these efforts, AWS Professional Services establishes […]  ( 11 min )
    Automatically generate model evaluation metrics using SageMaker Autopilot Model Quality Reports
    Amazon SageMaker Autopilot helps you complete an end-to-end machine learning (ML) workflow by automating the steps of feature engineering, training, tuning, and deploying an ML model for inference. You provide SageMaker Autopilot with a tabular data set and a target attribute to predict. Then, SageMaker Autopilot automatically explores your data, trains, tunes, ranks and finds […]  ( 10 min )
  • Open

    Latest ‘I AM AI’ Video Features Four-Legged Robots, Smart Cell Analysis, Tumor-Tracking Tech and More
    “I am a visionary,” says an AI, kicking off the latest installment of NVIDIA’s I AM AI video series. Launched in 2017, I AM AI has become the iconic opening for GTC keynote addresses by NVIDIA founder and CEO Jensen Huang. Each video, with its AI-created narration and soundtrack, documents the newest advances in artificial Read article > The post Latest ‘I AM AI’ Video Features Four-Legged Robots, Smart Cell Analysis, Tumor-Tracking Tech and More appeared first on NVIDIA Blog.  ( 3 min )
    Teens Develop Handwriting-Recognition AI for Detecting Parkinson’s Disease
    When Tanish Tyagi published his first research paper a year ago on deep learning to detect dementia, it started a family-driven pursuit. Great-grandparents in his family had suffered from Parkinson’s, a genetic disease that affects more than 10 million people worldwide. So the now 16-year-old turned to that next, together with his sister, Riya, 14. Read article > The post Teens Develop Handwriting-Recognition AI for Detecting Parkinson’s Disease appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Smoothed step function
    I mentioned smoothed step functions in the previous post. What would you do if you needed to concretely use a smoothed step function and not just know that one exists? We’ll look at smoothed versions of the signum function sgn(x) = x / |x| which equals -1 for negative x and +1 for positive x. […] Smoothed step function first appeared on John D. Cook.  ( 2 min )
    Partitions of unity, smooth ramps, and CW clicks
    Partitions of unity are a handy technical device. They’re seldom the focus of attention but rather are buried in the middle of proofs. The name sounds odd, but it’s descriptive. A partition of unity is a set of smooth functions into the interval [0, 1] that add up to 1 at every point. The functions […] Partitions of unity, smooth ramps, and CW clicks first appeared on John D. Cook.  ( 2 min )
  • Open

    Q&A: Alberto Rodriguez on teaching a robot to find your keys
    Associate professor and principal investigator with the MIT Schwarzman College of Computing’s Science Hub discusses the future of robotics and the importance of industry-academia collaborations.  ( 5 min )
    New program bolsters innovation in next-generation artificial intelligence hardware
    MIT AI Hardware Program launches with five inaugural companies to advance AI technologies for the next decade.  ( 5 min )
  • Open

    Automated Inventory Management System: An Ultimate Guide for 2022 and Beyond
    Inventory management is an essential part of any eCommerce business. Especially if you are an eCommerce business owner juggling multiple sales channels, it can save you a lot of effort. However, manually managing your inventories is also a recipe for error. Also, let’s not forget the time you have to spend and the painful process… Read More »Automated Inventory Management System: An Ultimate Guide for 2022 and Beyond The post Automated Inventory Management System: An Ultimate Guide for 2022 and Beyond appeared first on Data Science Central.  ( 5 min )
    What is the Difference Between Bounce Rate and Exit Rate?
    Statistics gives business owners the freedom to evaluate how their websites are performing. The evaluation involves a couple of things: the bounce rate and the exit rate. But what is the difference between bounce rate and exit rate? This is a point of discussion that requires you to have an open mind to grasp the… Read More »What is the Difference Between Bounce Rate and Exit Rate? The post What is the Difference Between Bounce Rate and Exit Rate? appeared first on Data Science Central.  ( 5 min )
    Five Major Benefits That Microsoft Power BI Brings To Data Scientists
    There is no denying the importance of the internet and IT in the business scene. Businesses hailing from all sectors are dependent on the web, and they also make use of various types of software applications nowadays. However, with time, such technologies are also evolving. Businesses are coping with huge amounts of data, and to… Read More »Five Major Benefits That Microsoft Power BI Brings To Data Scientists The post Five Major Benefits That Microsoft Power BI Brings To Data Scientists appeared first on Data Science Central.  ( 5 min )
  • Open

    Artificial Intelligence beats 8 world champions at a version of Bridge
    submitted by /u/kevinwangg [link] [comments]  ( 1 min )
    SAC and Position Control in Mujoco
    Hi! I'm currently using garage to simulate a robot EE controlled in position. Does anyone know how to make a relative action? I mean, I would like to have a small action space reinitialized at each step in order to have only small increases in position. Thanks! submitted by /u/Big-Picture8323 [link] [comments]  ( 1 min )
    Backpropogation in PPO and gym.ai
    I am currently build a PPO agent for OpenAI gym's Pong environment using Pytorch and I had a question: So typically the workflow is: Use the state as the input for a CNN and output the action probabilities Sample the action distribution for an action and perform env.step(action) with it Obtain there rewards and calculate some reward / loss function such as log probs*rewards Backpropogate this loss/reward using reward/loss.backward() and optimizer.step() Now, Pytorch's loss.backward() and optimizer.step() only calculates and updates gradients for pyTorch Variable objects where requires_grad=True. So how does Pytorch backpropogate through env.step() ? env.step() outputs numpy arrays (if you're using a parallel environment) or integers (not tensors)... Secondly, if I try to convert a Tensor output to a numpy array to input to env.step() - say an array of actions for parallel environments, it breaks my backpropogation right? Thirdly, does that mean that env.step() is a differentiable function? Thanks in advance! submitted by /u/Ska82 [link] [comments]  ( 1 min )
    Policy Gradients with Pytorch.
    Can someone please explain what "pw" is on the third image(marked with arrow) ? P.S: First two images are for context Thank you! ​ https://preview.redd.it/v9ogmrxvn9q81.png?width=753&format=png&auto=webp&s=4ecfc49aabbf328ecccf468ee01aa3767a7a4c8a https://preview.redd.it/4mrnjsxvn9q81.png?width=753&format=png&auto=webp&s=42be60ebaed04468d1b10928ba729327a0057b0f https://preview.redd.it/o01n5sxvn9q81.png?width=753&format=png&auto=webp&s=112e876c252e075829c1693cf0f6a5d4751cf79a submitted by /u/Whole_Run_4485 [link] [comments]

  • Open

    Can AI create a safer online world?
    submitted by /u/ML_Firefighter [link] [comments]
    Do people build physical perceptrons?
    A friend and I are thinking about building a physical perceptron as a summer project. However, I cannot find any resources on the physical implementation of the perceptron since Rosenblatt's in the 40's. Does anyone do this? What are some good resources? submitted by /u/HoldDoorHoldor [link] [comments]
    Weekly China AI Newsletter: China Strengthens Ethics Reviews on AI, Life Science; Users Can Turn off Recommendation Algorithms; Chinese Self-Driving Startup Raises $400 Million
    submitted by /u/trcytony [link] [comments]  ( 1 min )
    NeRF Research Turns 2D Photos Into 3D Scenes
    submitted by /u/MarS_0ne [link] [comments]
    Artificial Intelligence, Machine Learning and Society
    submitted by /u/pmz [link] [comments]
    11 Best Python Books for Data Science beginners to advanced to read in 2022 -
    submitted by /u/sivasiriyapureddy [link] [comments]
    Meet Jessica From LinkedIn, She Is Not A Human Being
    submitted by /u/satish_gaire [link] [comments]
    A mini-conversation with Kanye West's AI persona ended on a hilarious note
    submitted by /u/kuasha7 [link] [comments]
    DataRobot’s plan to democratize machine learning with no-code AI
    submitted by /u/bendee983 [link] [comments]
    Top 5 Python Time Series Libraries
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Learning to generate line drawings that convey geometry and semantics (CVPR 2022)
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    New AI Tool
    Now unlimited uses for anyone: https://botbox.dev/generator submitted by /u/Recent_Coffee_2551 [link] [comments]
    The latest marketing tactic on LinkedIn: AI-generated faces : NPR
    submitted by /u/Representative-Job23 [link] [comments]
  • Open

    [P] Any resources on fine-tuning models? - Decision Transformer
    Hello, i'm trying to reproduce the Decision Transformer paper, however i feel seriously lost on how to do it. I find no documentation on fine-tuning models and have no idea how to use the datasets. Any help would be much appreciated, thanks. submitted by /u/PM_ME_FREE_GAMES [link] [comments]
    [P] Seems like people are finding my data/ML job aggregator helpful… do you have any feedback for me?
    Hey everybody, around half a year ago, I created a simple job aggregator called datajoblist.com, which fetches/scrapes remote jobs in data science, data engineering and AI from multiple sources and presents them in a simple, unified interface. The jobs are collected both directly from interesting (to me) companies like Stripe or Shopify, as well as filtered from job boards such as weworkremotely.com. I have not touched the site since when I first built it half a year ago, but it seems that people are finding it helpful, as it is now getting rather stable lower few thousand unique visitors per month, and has facilitated thousands of “apply” click-throughs to company sites. A few dozen people even signed up for the mailing list. So, I was thinking about investing a little more time now and adding some improvements. Is there any information/functionality that you would like to see there? Shortly, I will be adding the possibility to post jobs for a small fee (till now, all jobs on the site have been aggregated from elsewhere), but would love to add some usability improvements that are reasonably simple to implement for me. (Perhaps salary ranges, where available?) Thanks for any feedback and have a great day! submitted by /u/k_kristian [link] [comments]  ( 1 min )
    [D] Neural Networks are not the only universal approximators, so why are they so uniquely effective?
    I often here the success of neural networks attributed to their status as universal approximators, but there are many algorithms that are universal approximators. For example, decision trees can also be universal approximators, but they don't seem to have nearly as much success. Why is this? What do neural networks have beyond just being universal approximators that makes them special? ​ Is this a question that is currently well understood or is the answer to this question still an area of research? submitted by /u/029187 [link] [comments]  ( 3 min )
    [D] Why GNNs suffer from over-smoothing but CNNs don't?
    Many articles online say GNNs suffer from over-smoothing because nodes aggregate their neighbors and many nodes share similar sets of neighbors. However, in CNN, each pixel also aggregates its neighbors. But CNN can still perform well on some pixel-level classification tasks such as segmentation. submitted by /u/AirZealousideal1342 [link] [comments]  ( 1 min )
    [D] Paper Review Video - Memory-assisted prompt editing to improve GPT-3 after deployment
    https://youtu.be/gYxJEd3EUKs Large language models such as GPT-3 have enabled many breakthroughs and new applications recently, but they come with an important downside: Training them is very expensive, and even fine-tuning is often difficult. This paper presents an adaptive method to improve performance of such models after deployment, without ever changing the model itself. This is done by maintaining a memory of interactions and then dynamically adapting new prompts by augmenting them with memory content. This has many applications, from non-intrusive fine-tuning to personalization. ​ OUTLINE: 0:00 - Intro 0:40 - Sponsor: Introduction to GNNs Course (link in description) 1:30 - Paper Overview: Improve GPT-3 after deployment via user feedback 5:30 - Proposed memory-based architecture 13:00 - A detailed look at the components 15:00 - Example tasks 24:30 - My concerns with the example setup 26:20 - Baselines used for comparison 29:50 - Experimental Results 34:20 - Conclusion & Comments ​ Paper: https://arxiv.org/abs/2201.06009 Code & Data: https://github.com/madaan/memprompt submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Diversity in Recommendation Systems with a Mostly Unexplored Item List
    Many recommendation systems start with a few very popular items that were heavily marketed. The rest of the item list is largely unexplored. How do recommender systems get around this bias and "test" out new items on users to develop richer training data? I could see how a multi-arm bandit might fix this problem but I'd love to hear other ideas and lessons learned. submitted by /u/Shap177 [link] [comments]  ( 1 min )
    [P] I've released a Python package which lets you generate vector representations of images with a twist: neither PyTorch nor TensorFlow is used!
    https://github.com/minimaxir/imgbeddings Instead, this package uses an ONNX INT8-quantized version of CLIP's Vision layers, which in testing works just as well, with a significant performance boost. The demos also turned out very well, and try to a bit more fun than usual. submitted by /u/minimaxir [link] [comments]  ( 1 min )
    [P] Decision Transformers in Transformers library and in Hugging Face Hub
    Hey there, We’re happy to announce that Edward Beeching from Hugging Face has integrated Decision Transformers an Offline Reinforcement Learning method, into the 🤗 transformers library and the Hugging Face Hub. In addition, we share nine pre-trained model checkpoints for continuous control tasks in the Gym environment. If you want to know more about Decision Transformers and how to start using it, we wrote a tutorial 👉 https://huggingface.co/blog/decision-transformers We would love to hear your feedback about it, Thanks, submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    [D] Catboost performance on Python vs C++
    Hi fellow nerds, was wondering if anyone has trained the same catboost model on the same dataset in Python and C++ to see which is quicker. Also posting in case someone knows why one language may be inherently quicker. I assume that they are the same program with the same run time but I can’t be too sure of that. Thanks. submitted by /u/econ1mods1are1cucks [link] [comments]  ( 1 min )
    [P] Release the Vision Transformer Cookbook with Tensorflow ! (Thanks to @lucidrains)
    ​ Vision Transformer Cookbook ​ Hello, I have released the Vision Transformer Cookbook with Tensorflow ! Therefore, you can easy to use the 22 transformer architectures via just copy & paste. I hope this repository would help many people, including tensorflow users. Thank you. ​ * code: vit-tensorflow submitted by /u/taki0112 [link] [comments]  ( 1 min )
    "[Discussion] Create a Random Forest Regression to predict multiple values in future using past data"
    I am using Random Forest Regression on a power vs time data of an experiment that is performed for a certain time duration. Using that data, I want to predict the trend of power in future using time as an input. The code that has been implemented is mentioned below. The data set consists of approximately 30 hours of power vs time values as mentioned below. Only active power and time_h columns are used in the algorithm. ​ Data set used for modelling # Creating X and y X = np.array(series[['time_h']]).reshape(-1,1) y = np.array(series['active_power']) # Splitting dataset in training and testing X_train2,X_test2,y_train2,y_test2 = train_test_split(X,y,test_size = 0.15, random_state = 1) # Creating Random Forest model and fitting it on training data forest = RandomForestRegressor(n_estimat…  ( 2 min )
    [Discussion] I am a sample in the dataset I have to analyze
    Basically the title. I work as a data engineer for a company I am also a customer of. From an Ethics in ML point of view: what do you think this implies on my responsibilities? submitted by /u/Bani57 [link] [comments]  ( 1 min )
    [D] Everything about Attention Family
    Hey, I have just published my latest medium article. These days, in deep learning, it is usual to hear about transformers’ outstanding performance on the challenges where other algorithms can not meet our expectations when most of them are based on attention. This article gives you a detailed illustration of the code and mathematics of the four most-used types of attention in the Deep Learning era. https://rezayazdanfar.medium.com/everything-about-attention-family-644747903c60 submitted by /u/rezayazdanfar [link] [comments]  ( 1 min )
    [R] New t-SNE guidelines, an experimental study, and automatic t-SNE hyperparameter selection
    t-SNE remains a popular embedding method for visualizing high-dimensional data. However, there is little consensus on how to select hyperparameters such as perplexity, learning rate, and exaggeration to best visualize arbitrary data sets. This work systematically explores t-SNE hyperparameters using almost 700 data sets. We replicate past studies, proving that some t-SNE guidelines generalize beyond their original context. But we find that some guidelines do not appear to generalize. We also show a proof of concept neural network system for featurizing data sets and automatically recommending good t-SNE hyperparameters. Paper: https://osf.io/6t5ax/ Blog: https://twosixtech.com/new-guidance-for-using-t-sne/ submitted by /u/rpgove [link] [comments]  ( 3 min )
    [R] Text to Mesh Without 3D Supervision Using Limit Subdivision (Clipmesh)
    submitted by /u/InfamousPancakes [link] [comments]  ( 2 min )
    [P] MLbot – Open-source tool to train ML models in your cloud, with a single command.
    Hey ML Reddit! I just released the initial version of MLbot (https://github.com/thecooltechguy/mlbot): a new open-source tool that I’ve been working on for running distributed ML training jobs in your cloud, with a single command. How it works: In short, it allows you to run your training script in the cloud by simply swapping “python” for “mlbot run”. For example, if ``python train.py … can run your training script locally, then mlbot run --instance-type p3dn.24xlarge --num-nodes 2 train.py … should be able to run your code in the cloud across 2 GPU machines. Since this tool runs entirely inside your cloud environment, you don’t have to transfer your training data to a 3rd party, while having full observability into the underlying infrastructure. Why I built this: In a recent ML pr…  ( 2 min )
    [D] What is the following NLP task called - explaining WHY someone feels a particular way in a product review? For example with the sentence - "I don't like the tone of the guitar because the strings are too old", the explanation for negative sentiment of guitar should be "strings are too old".
    Looking for ideas and pointers on how to solve this problem. Dependency parsing? Are there any open source ML models to solve this problem? Googling isn't helping. Made a mistake in the description - we want to explain the negative sentiment of the aspect "tone" (rather than guitar) submitted by /u/ml_guy1 [link] [comments]  ( 1 min )
    [D] Difference between Research and Applied Research track in conferences?
    Hello, I am in the process of finding a suitable conference for my paper, and I find that they usually have a research track and applied research track. One conference defines the applied research track as The Applied Research Track aims at attracting submissions from both industry and academia that either solve or advance the understanding of issues related to deploying AI, Information Retrieval (IR), and big data technologies as part of actual applications. My paper is roughly related to applying deep learning for time series anomaly detection. Should I go for the research track or the applied research track? submitted by /u/mythrowaway0852 [link] [comments]  ( 1 min )
  • Open

    Basic neural network robots learning by genetic algorithms - survival of the fittest! Crazy simulations - source code included
    submitted by /u/djrobsmith [link] [comments]  ( 1 min )
    Decision Transformers in Transformers library and in Hugging Face Hub 🤗
    Hey there 👋🏻, We’re happy to announce that Edward Beeching from Hugging Face has integrated Decision Transformers an Offline Reinforcement Learning method, into the 🤗 transformers library and the Hugging Face Hub. In addition, we share nine pre-trained model checkpoints for continuous control tasks in the Gym environment. If you want to know more about Decision Transformers and how to start using it, we wrote a tutorial 👉 https://huggingface.co/blog/decision-transformers We would love to hear your feedback about it, In the coming weeks and months, we will be extending the reinforcement learning ecosystem by: Being able to train your own Decision Transformers from scratch. Integrating RL-baselines3-zoo Uploading RL-trained-agents models into the Hub: a big collection of pre-trained Reinforcement Learning agents using stable-baselines3 Integrating other Deep Reinforcement Learning libraries Implementing Convolutional Decision Transformers for Atari And more to come 🥳, so 📢 The best way to keep in touch is to join our discord server to exchange with us and with the community. Thanks, submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    Discount factor Agents is available at different rates
    Assume that the time between two agent actions is not fixed, i.e. depending on the state-action, the agent can become unavailable for a time t = t(s, a). During the time the agent is unavailable, several rewards are produced by the environment, and they need to be given to the agent whenever it becomes available again. One easy way to deal with this is to just store them and set the reward at the next available state as the sum of the accumulated rewards. But in the discounted reward framework with temporal difference (e.g. DQN) this does not discount rewards properly. How can I set the reward for the next state such that it contains all the accumulated rewards but it is correct in the DQN setting? submitted by /u/fedetask [link] [comments]  ( 2 min )
    Current State-of-the-art RL algorithms
    What are the current best algorithms in Reinforcement Learning? It seems everyone still uses TD3, SAC, PPO, Rainbow DQN, etc. However, these are mostly from 2018, which is old for RL standards. What happened afterwards? What is the current algorithm for these kinds of standard tasks? I'm especially interested in algorithms that can handle continuous action spaces. Thank you very much! submitted by /u/Paraiso93 [link] [comments]  ( 2 min )
    noob question about Bellman's optimality principle
    i'm reading the Sutton-Barto and at page 63 it is written then v_star(s)=max_a q_star(a,s), my question is why we have this? where does it come from? i'm trying to start from the definition of v_star and q_star but I can't really find a way submitted by /u/samas69420 [link] [comments]  ( 2 min )
  • Open

    What’s Your Business Model Choice – Hammers or Casino?
    We are in the middle of a business model revolution.  And we are active participants in that revolution.  We have been transitioning from a society where possession and application of physical commodities defined wealth and power, to a society where possession and application of knowledge define wealth and power.  Throughout the 20th century, oil had… Read More »What’s Your Business Model Choice – Hammers or Casino? The post What’s Your Business Model Choice – Hammers or Casino? appeared first on Data Science Central.  ( 7 min )
    Datacenter relocation is now easier, faster, and more affordable
    It is common for growing organizations to reach a point where their existing data solution is no longer adequate for their needs. In most cases, it happens with companies that have used an on-premises infrastructure from the earliest days of business but now need to upgrade their network for continued growth. However, relocating equipment and… Read More »Datacenter relocation is now easier, faster, and more affordable The post Datacenter relocation is now easier, faster, and more affordable appeared first on Data Science Central.  ( 3 min )
    The Evolution of Astronomical AI
    Astronomy has seen an exponential rise in data collection over the last decade. This requires new methods for data analysis, including AI. With the launch of new surveys, big data methodology has become a necessity. A new class of extremely large telescopes has evolved to collect vast amounts of data; The volume of data collected… Read More »The Evolution of Astronomical AI The post The Evolution of Astronomical AI appeared first on Data Science Central.  ( 4 min )
    What to Do About the New AI Regulation?
    When such a sophisticated, risky, and complex technology like AI takes our lives by storm, a clearly defined set of rules on its usage is paramount. Previously, public concern was mostly focused on the inappropriate use of personal data. As AI becomes a key technology in many businesses and services, the attention is rightfully shifting… Read More »What to Do About the New AI Regulation? The post What to Do About the New AI Regulation? appeared first on Data Science Central.  ( 4 min )
    How Automation and AI Are Changing Internet Marketing
    From chatbots and other remote helpers to producing content, improving client encounters, AI companies are now rolling out significant improvements to the advanced promoting scene. While it might be hard to foresee what’s to come, it’s not difficult to see that AI will proceed to advance and assume an undeniably focal point in computerized advertising.… Read More »How Automation and AI Are Changing Internet Marketing The post How Automation and AI Are Changing Internet Marketing appeared first on Data Science Central.  ( 4 min )
    How Python Became THE Language for Data Science
    What is Data Science? Data science is a study that helps us to extract information from a set of structured or unstructured data. It makes use of the study of statistics, mathematics, scientific computation to analyze the data.  Demand for Python in Data Science: Before we deep dive into the topic let’s firstly discuss why… Read More »How Python Became THE Language for Data Science The post How Python Became THE Language for Data Science appeared first on Data Science Central.  ( 7 min )
    Three Critical Steps for Data-Driven Success
    In today’s landscape, businesses need to look for any competitive advantage they can to ensure their survival, growth and success. A key aspect of gaining a competitive advantage is using data-driven insights to empower decisions for marketing, consumer insights, consumer segmentation, and operations, such as merchandising and real estate.  Especially within large companies, it is… Read More »Three Critical Steps for Data-Driven Success The post Three Critical Steps for Data-Driven Success appeared first on Data Science Central.  ( 4 min )
    Data Governance Tool: What To Look For?
    Data governance is the management of organizations’ data availability, usability, integrity, security, and privacy. According to Gartner, Data governance is the specification of decision rights and a framework for accountability to assure acceptable behavior in the value, generation, consumption, and control of data and analytics. Why Do Organizations Need It? It ensures that data is consistent,… Read More »Data Governance Tool: What To Look For? The post Data Governance Tool: What To Look For? appeared first on Data Science Central.  ( 3 min )
    Top MDM-Enabled Data Security Hacks You Should Know About
    Introduction Security is the buzzword for the digital world today. Businesses have realized that thriving and surviving without a well-functioning security system in place is tough. Security breaches, malware, ransomware and similar incidents are real. The businesses that have suffered from malicious attacks very well know how grave these attacks can be.  The importance of… Read More »Top MDM-Enabled Data Security Hacks You Should Know About The post Top MDM-Enabled Data Security Hacks You Should Know About appeared first on Data Science Central.  ( 5 min )
  • Open

    Security tool guarantees privacy in surveillance footage
    “Privid” could help officials gather secure public health data or enable transportation departments to monitor the density and flow of pedestrians, without learning personal information about people.  ( 6 min )
  • Open

    Looking for the next prime
    Suppose you start with some large number x and want to find a prime number at least as big as x. First you test whether x is prime. Then you test whether x + 1 is prime. Then you test whether x + 2 is prime, and so on until you find a prime. Of […] Looking for the next prime first appeared on John D. Cook.  ( 3 min )
  • Open

    Top 5 Python Time Series Libraries
    submitted by /u/RubiksCodeNMZ [link] [comments]

  • Open

    Agile, Agile 2 and Agility, Part I
    If you are running a business today using Agile methods, it’s likely that you are not getting the productivity boost from it that you should, and your time to market for new features is probably not what it could be either. Is that the end of the world?  By and large, yes!  The problem is… Read More »Agile, Agile 2 and Agility, Part I The post Agile, Agile 2 and Agility, Part I appeared first on Data Science Central.  ( 5 min )
    Commercial Artificial Intelligence — The Future of BI
    The dynamics of the global commercial artificial intelligence market continues to change over time, thanks to the persistent advancements in technology. This research report offers a detailed and insightful assessment of the global commercial artificial intelligence market, taking primary trends and the future prospects of this market in consideration. Various segments of this market, based on a… Read More »Commercial Artificial Intelligence — The Future of BI The post Commercial Artificial Intelligence — The Future of BI appeared first on Data Science Central.  ( 3 min )
    GitHub Co-Pilot Alternatives: Can They Match the Functionality of Co-Pilot?
    One of the best known examples of GPT-3 for developers is the Github co-pilot Trained on billions of lines of public code, GitHub Copilot is more than autocomplete of code. GitHub Copilot is powered by Codex, the new AI system created by OpenAI. GitHub Copilot understands significantly more context than most code assistants. GitHub Copilot… Read More »GitHub Co-Pilot Alternatives: Can They Match the Functionality of Co-Pilot? The post GitHub Co-Pilot Alternatives: Can They Match the Functionality of Co-Pilot? appeared first on Data Science Central.  ( 3 min )
    Five Key Components of a Data Sharing Platform
    Increasingly, companies are focused on finding ways to connect to new and valuable sources of data in order to enhance their analytical capabilities, enrich their models, or deliver more insight to their business units.  Due to the increased demand for new data sources, companies are also looking at their internal data differently. Organizations that have… Read More »Five Key Components of a Data Sharing Platform The post Five Key Components of a Data Sharing Platform appeared first on Data Science Central.  ( 7 min )
    Smart Maintenance – How SaaS Frameworks Turn Insights Into Actions Quickly And Efficiently
    Despite the technological breakthroughs in the advent of Industry 4.0, manufacturers seem to have taken a more gradual approach to adoption. In 2020, less than 30 percent of the industry considered themselves extensive users of advanced integrated tools and processes. The pandemic, however, brought out an unprecedented need to explore opportunities that make manufacturing systems… Read More »Smart Maintenance – How SaaS Frameworks Turn Insights Into Actions Quickly And Efficiently The post Smart Maintenance – How SaaS Frameworks Turn Insights Into Actions Quickly And Efficiently appeared first on Data Science Central.  ( 5 min )
    Toll-free number: What is it, and how can you get one for your business?
    What is the toll-free number? Businesses provide a cloud-based contact number to allow customers to contact them free of cost. In India, this number- the business toll-free number is available in the 1800 series in an easily recognizable format- 1800-ABC-DEFG. Customers do not have to incur any fee to contact the business, as the company… Read More »Toll-free number: What is it, and how can you get one for your business? The post Toll-free number: What is it, and how can you get one for your business? appeared first on Data Science Central.  ( 4 min )
    Top Strategies and Best Practices for Big Data Testing
    With the exponential growth in the number of big data applications in the world, Testing in big data applications is related to database, infrastructure and performance testing, and functional testing. The advancement of technology is enabling the collection of a massive amount of data almost every second. And, big data has emerged as the buzzword… Read More »Top Strategies and Best Practices for Big Data Testing The post Top Strategies and Best Practices for Big Data Testing appeared first on Data Science Central.  ( 3 min )
  • Open

    Seed vault, but for code
    I had heard of the Svalbard Global Seed Vault, but I hadn’t heard of the nearby Arctic World Archive until today. The latter contains source code preserved on film, a format that should last at least 500 years. Seed vault, but for code first appeared on John D. Cook.  ( 1 min )
  • Open

    A.I. that turns written documents into practice tests. For easy learning. Is this easy or challenging programming?
    submitted by /u/143openyourmind [link] [comments]
    Check out this research summary article based on the paper 'SS-SAM: Stochastic Scheduled Sharpness-Aware Minimization for Efficiently Training Deep Neural Networks' where Researchers From Tsinghua University Propose ‘Stochastic Scheduled SAM’ (SS-SAM) for reducing the computational overhead
    Deep Neural Networks (DNNs) have excelled at solving complex real-world problems, however, training a good DNN has become more complex. It is challenging to ensure that the optimizers used will converge to reliable minima with acceptable model performance when only minimizing the conventional empirical loss. Tsinghua University’s research team proposes Stochastic Scheduled SAM (SS-SAM), a novel and effective DNN training strategy. In SS-SAM, the optimizer is set up by a predetermined scheduling function to run a random trial at each update step, which selects whether to perform the SGD or SAM optimization at random. The overall number of propagation pairs could be significantly decreased in this approach. The team’s approach provides equivalent or higher model training performance at a lower computational cost than baseline sharpness-aware minimization (SAM). Continue Reading Paper: https://arxiv.org/pdf/2203.09962.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    What are easy to use image editing AIs?
    submitted by /u/xXLisa28Xx [link] [comments]
    Artificial Nightmares: Limgrave || Clip Guided Diffusion AI Art Video [4K 16 FPS]
    submitted by /u/Thenamessd [link] [comments]
    NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI
    submitted by /u/qptbook [link] [comments]
    This endless live TV show run entirely by AI characters
    submitted by /u/the_embassy_official [link] [comments]  ( 1 min )
    7 Best Natural Language Processing Courses (2022) | Best NLP Courses -
    submitted by /u/sivasiriyapureddy [link] [comments]
    If you have expertise in this field, is it realistic whatsoever to create an A.I version of yourself to live on (let's say work on it for 30 years, starting from now)?
    As in...get the voice down from improving an A.I recording. Then, give it some basic code or responses that you had at different ages. Also, getting a bunch of 3D images of yourself. Then coding it with some basic "values" and creating some sort of generic conditional statement for the 3 basic values that it has that match yours. Then, over time, actually diving into artificial intelligence and slowly updating and replacing those to continue improving it to match you (and your growth)? Hmmm, I wonder if there would be a way to preserve it. Everything changes (like sites -> VR and so on). So some sort of "survival" instinct (which seems impossible to code but would be fun to try undertaking). submitted by /u/the_evil_intp [link] [comments]  ( 2 min )
    Future after Automation and AGI
    submitted by /u/HumanSeeing [link] [comments]  ( 2 min )
    Face filters on the web from just text descriptions
    submitted by /u/pmz [link] [comments]
    👉 Impressed With AlphaFold? Checkout This Protein Structure Prediction Model (FastFold) That Reduces AlphaFold’s Training Time From 11 Days To 67 Hours
    DeepMind released AlphaFold 2 last year, which made headlines for its incredible accuracy in protein structure prediction. The success of AlphaFold demonstrated that deep neural networks might be used to solve challenging and critical structural biology problems. FastFold is a highly effective protein structure prediction model formulation for training and inference developed by a group of researchers from the National University of Singapore. Although AlphaFold 2 is a game-changer in protein structure prediction, training and inference remain time-consuming and costly. This is something that the study team is concerned about. Continue Reading This Article Here Paper: https://arxiv.org/pdf/2203.00854v1.pdf Github: https://github.com/hpcaitech/FastFold submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    Check out this research summary article based on the paper 'SS-SAM: Stochastic Scheduled Sharpness-Aware Minimization for Efficiently Training Deep Neural Networks' where Researchers From Tsinghua University Propose ‘Stochastic Scheduled SAM’ (SS-SAM) for reducing the computational overhead
    Deep Neural Networks (DNNs) have excelled at solving complex real-world problems, however, training a good DNN has become more complex. It is challenging to ensure that the optimizers used will converge to reliable minima with acceptable model performance when only minimizing the conventional empirical loss. Tsinghua University’s research team proposes Stochastic Scheduled SAM (SS-SAM), a novel and effective DNN training strategy. In SS-SAM, the optimizer is set up by a predetermined scheduling function to run a random trial at each update step, which selects whether to perform the SGD or SAM optimization at random. The overall number of propagation pairs could be significantly decreased in this approach. The team’s approach provides equivalent or higher model training performance at a lower computational cost than baseline sharpness-aware minimization (SAM). Continue Reading Paper: https://arxiv.org/pdf/2203.09962.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    why is the variable being passed for iterations suddenly passing empty value?
    As you can see in the figure, i'm passing the variable data_inputs to the feedforward_comp function over iterations. Up to epoch = 9, everything computes fine but after that, data_inputs suddently is being passed as empty? Can someone please tell me why this happens and how to fix it? https://preview.redd.it/2wds226ktup81.png?width=816&format=png&auto=webp&s=61c55db011708e2d2b373a0b4e4b02355ec50730 submitted by /u/lwhisper [link] [comments]  ( 1 min )
    Which laptop should I buy?
    Hey guys, I really need your help on this one. I go to university next year and I will be studying computer engineering which means there will be a lot of coding going into it. Now I have 4 options for laptop First : INSPIRON 15.6" INTEL CORE 17-1165G7 TOUCHSCREEN 2-IN-1 LAPTOP Second: GALAXY BOOK PRO 360 15" 2-IN-1 INTEL 17 LAPTOP Third: HP 15.6" Touchscreen 2-in-1 Laptop - Nightfall Black (AMD Ryzen 7 5700U/1TB SSD/16GB RAM/Windows 10) Fourth: HP Pavilion x360 15.6" Touchscreen 2-in-1 Laptop - Silver (Intel Core i7-1165G7/1TB SSD/16GB RAM/Win 11) Please and thank you guys :) submitted by /u/Traditional-Cow47 [link] [comments]  ( 2 min )
  • Open

    [P] We built a AI platform to advance stereotactic radiosurgery (SRS) for brain tumor patients
    4 years ago, I posted here to introduce some work I did using AI for breast tumor detection and classification: https://www.reddit.com/r/MachineLearning/comments/8rdpwy/pi_made_a_gpu_cluster_and_free_website_to_help/ That post gain some traction on Reddit and I hope you would like the one I am gonna introduce here again. In the recent years, I have been shifting my focus from cancer detection to the actual treatment. One particular problem we really want to solve is to have more brain cancer patients to be accessible to stereotactic radiosurgery (SRS) which has a lot better treatment outcome and much better quality of life (QoL) for the patient than whole brain radiotherapy (WBRT) which is more common for patients with multiple brainiest (say more than 5 or 10). The reason behind …  ( 2 min )
    [D] Is Colab Pro Worth the money?
    Hey guys, I'm currently dealing with my bachelor degree's final project. But my pc seems to be slow and it gets really hot + I think it might be dying, I really need to send it to the technical service. :( Well, I'm not familiar with other cloud services like Azure or AWS but I used Google Colab a lot, and right at this moment I also use it. But it's constantly asking if I'm "there". It always wants an interaction otherwise it shutdowns the session and my time gets wasted, just gotta do everything from the start. So if I pay for the colab pro (unpluss) version, will my experience get better? Will I need to interact with colab every hour again? Or should I consider other alternatives? submitted by /u/average_turanist [link] [comments]  ( 1 min )
    [D] Machine Learning - WAYR (What Are You Reading) - Week 134
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/CatalyzeX_code_bot: Paper link /u/PaganPasta: https://arxiv.org/abs/2105.05233 Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]  ( 1 min )
    [N][R] Combine Lidar and Cameras for 3D object detection - Waymo & Google Research
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    [Discussion] Interesting prediction of ecosystem around giant DNN models
    Saw this comment in a Q&A on big deep learning models. This seems to have side-effects both good and bad. The prediction, if true, forces accessibility (though expensive) but also creates silos. First time poster. What do you guys think? https://m12.vc/news/direct-line-with-saurabh-tiwary-whats-next-for-large-foundational-models The economics are making it untenable for most people except the most well-funded organizations to invest in large language models. I will make the comparison to the semiconductor ecosystem. If you look at fabrication economics for semiconductor chips, they cost tens to hundreds of millions of dollars and have relatively short lifetimes. One needs very large volume usage to justify manufacturing a custom ASIC (Application Specific Integrated Circuits). Thus, we do not have that many companies fabricating chips. However, we have an entire software and systems eco-system which relies on these chips that have built massive industries around them. And, if you look at the biggest companies in the world (maybe, except Apple), they have very little to do with ASIC design and fabrication as part of their core business. I think a similar eco-system would pan out in the large-scale modeling space as well. We would have a few well-funded companies that would be training these extremely large and reusable models and other companies would build applications and services reusing and customizing these models. submitted by /u/SufficientActive8895 [link] [comments]  ( 1 min )
    [D] Modern data augmentation techniques
    I've written a short blog post on modern data augmentation techniques. Please have a read and provide feedback. I've explained Cutout, Mixup, CutMix and Label smoothing with code and examples. https://pmgautam.com/augmentations/2022/03/27/Augmentations-visually-explained.html submitted by /u/p1g1 [link] [comments]  ( 2 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 1 min )
    [P] Author pages on http://papers.labml.ai
    We added author pages, which lists all the papers by the author and links to their social and academic web pages. E.g.: https://papers.labml.ai/author/39815586a03711ecbb8c3d25c114d5ed https://papers.labml.ai/author/56b63a47a03711ecbb8c3d25c114d5ed Highlights, Links to Google and Arxiv searches. Sort papers based on the published date and the popularity in Twitter. Links to Twitter, Google Scholar, Github and Linkedin etc if available in our database. We love to hear your feedback and suggestions. Thank you all, and I appreciate the support. submitted by /u/hnipun [link] [comments]  ( 1 min )
    [R][P] GroupViT: Semantic Segmentation Emerges from Text Supervision + Hugging Face Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
  • Open

    The first open-source project for financial reinforcement learning
    submitted by /u/zicxor [link] [comments]  ( 1 min )
    Using RL to play Jump King
    I am learning RL by having the algorithm play Jump King and am streaming it on twitch, having the chat play against it as well to see who can get the babe first. Check it out at: https://www.twitch.tv/unassignedseat submitted by /u/UnassginedSeat [link] [comments]  ( 1 min )
    [Question][DRL] Are intermediate activations used during training?
    Hello all, I have a question regarding optimizing a policy represented by a neural network. In Supervised Learning, the intermediate activations created during the forward pass are needed during backpropagation in order to compute weight gradients. This has led to a number of memory management techniques such as offloading and checkpointing being created. My question is whether the same is true in DRL. For policy-gradient methods for example, learning starts from an objective computed from the trajectory such as the discounted returns, but are the intermediate activations created during action inference needed when optimizing the policy (i.e. learning)? Is there any academic source that covers this topic? Thanks! submitted by /u/PSylvan [link] [comments]  ( 1 min )
    RaveForce in 2022: The OpenAI Gym style toolkit for music generation experiments just got better
    submitted by /u/chaosprint [link] [comments]
    Is my problem suited for solving via reinforcement learning methods? What approach should I start with?
    My goal is to determine the best course of actions to take given a certain state. I'm working in some state space X. For every x in X, I can assign it a value. When I perform an action a given x, I map x to some new state x'. The state x' depends on my action up to some noise produced by the environment. I think this is a reinforcement learning problem. What methods are suitable in this context? submitted by /u/heylanguage [link] [comments]  ( 1 min )
    Deck generation using reinforcement learning
    A brief overview of the game: A deck of 50 objects are chosen from a set of ~1000 objects. The game is then played out deterministically, and rewards are dished out based on win/loss. I would like to build a nn that can produce good decks , trained using self-play. However, I'm not too sure how to approach this problem. Relevant research or pointers would be very helpful. Thanks. submitted by /u/nutpeabutter [link] [comments]  ( 1 min )
  • Open

    Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future. (arXiv:2106.04420v7 [cs.LG] UPDATED)
    In real-time forecasting in public health, data collection is a non-trivial and demanding task. Often after initially released, it undergoes several revisions later (maybe due to human or technical constraints) - as a result, it may take weeks until the data reaches to a stable value. This so-called 'backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. In this paper, we introduce the multi-variate backfill problem using COVID-19 as the motivating example. We construct a detailed dataset composed of relevant signals over the past year of the pandemic. We then systematically characterize several patterns in backfill dynamics and leverage our observations for formulating a novel problem and neural framework Back2Future that aims to refines a given model's predictions in real-time. Our extensive experiments demonstrate that our method refines the performance of top models for COVID-19 forecasting, in contrast to non-trivial baselines, yielding 18% improvement over baselines, enabling us obtain a new SOTA performance. In addition, we show that our model improves model evaluation too; hence policy-makers can better understand the true accuracy of forecasting models in real-time.  ( 3 min )

  • Open

    I created a new AI art maker program (free)
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    How Does a Self-Driving Car See? (Waymo ‘s system explained)
    submitted by /u/OnlyProggingForFun [link] [comments]
    5 Best Movies Like 'After Yang' About Artificial Intelligence (A.I.)
    submitted by /u/NarutoNotBoruto [link] [comments]
    ai for enamels and paints
    Love how my keyboard konked out. So I recently read about megasyn the AI being used to create 40k new weapons in 6 hours. This got me wondering. Can AI be made to create different enamels and paints for things like pottery glazes, cloisonne enamels and paints for art work? If so how would one go about doing this knowing literally nothing? submitted by /u/Grendal87 [link] [comments]  ( 1 min )
    Useful Tools and Programs list for AI/ML
    Found a useful list of Tools and Programs for AI/ML. Looks like it covers Machine Learning, Deep Learning, Computer Vision(CV), and Natural Language Processing (NLP). I thought I'd share it for anyone that's interested. https://github.com/mikeroyal/Machine-Learning-Guide submitted by /u/Khaotic_Kernel [link] [comments]
    GPT-3's knowledge was limited to the world until 2019. InstructGPT is apparently up-to-date.
    submitted by /u/BeginningInfluence55 [link] [comments]
    Crystal Forest AI Art
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    Deep Convolutional Generative Network Tutorial in PyTorch
    I thought it will be quite interesting to see Deep Convolutional GAN’s capability in generating wildlife, so I wrote a tutorial on how to build a model based on the DCGAN architecture through PyTorch: https://taying-cheng.medium.com/create-new-animals-using-dcgan-with-pytorch-2ce47810ebd4 submitted by /u/Ok-Peanut-2681 [link] [comments]
    AI News | Animal Language Translator AI | Heart Attack Prediction Algo | Nvidia H100 GPU & AI Supercomputer
    submitted by /u/getrich_or_diemining [link] [comments]
    Researchers Open-Source WiSE-FT Algorithm For Fine Tuning AI Models
    When making zero-shot inference, large pre-trained models like CLIP or ALIGN provide consistent accuracy across various data distributions (i.e., without fine-tuning on a specific dataset). While existing fine-tuning methods vastly improve accuracy on a given target distribution, they frequently compromise robustness to distribution shifts. This conflict can be resolved by presenting a simple and effective strategy for enhancing robustness while fine-tuning: assembling the zero-shot and fine-tuned models (WiSE-FT). An approach for fine-tuning AI models that enhance robustness during distribution shift has been open-sourced by researchers from the University of Washington (UW), Google Brain, and Columbia University. According to tests, WISE-FT improves accuracy by up to 6% on specific computer vision (CV) benchmarks. Continue Reading The Research Summary Article Here Paper: https://arxiv.org/pdf/2109.01903.pdf Github: https://github.com/mlfoundations/wise-ft https://preview.redd.it/1ltzmg5iwnp81.png?width=2803&format=png&auto=webp&s=6c0727432072b67c1723838e3097ec901f34b1c0 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    [D] Conditional GAN loss magnitudes
    Hey, I am wondering about the losses of conditional GANs, particularly their magnitude. Due to the amount of classes/identities, the classification loss will usually be significantly higher than discrimination loss (data identified as real or generated). When using traditional multitask learning where losses are simply summed up, how is the generator supposed to learn to generate realistic-appearing data when the loss that would encourage that is so low in comparison to the classification loss? submitted by /u/Timboron [link] [comments]  ( 1 min )
    [D] Augmentation in GAN
    Hi, does anyone has experience in augmentation for GANs? Especially for Cycle-GAN like image to image translation. When I see image-to-image GANs, there is mostly no augmentation applied. submitted by /u/SeucheAchat9115 [link] [comments]  ( 1 min )
    [Discussion] How does apple FaceID work?
    I am doing some face recognition and am wondering how does apple's FaceID work. They say it is a "true depth" camera. What does that mean? is that a lidar? some kind of a dot projector? Basically what I want to know is what kind of data does that device provide. Also, is there a commercially available device similar to that true depth camera available? submitted by /u/user89320 [link] [comments]  ( 1 min )
    [P] Keep training GAN?
    Hi there, ​ I'm trying to train a GAN to generate video game portraits (think Baldur's Gate, Divinity, that kind of stuff). My GAN is trained on 4096 portraits, 128x192 pixels in size. Batch size is 64, 64 features, 128 dim noise vector. ​ After a few epochs of the expected low quality random stuff, my Generator starts generating images where you can kind of imagine silhouettes and faces. Here's after 250 epochs: ​ https://preview.redd.it/afc4i2r6zpp81.png?width=568&format=png&auto=webp&s=f8e386a8da4976a20b8db525914e5944346b9240 But after 500 the results are pretty much the same: ​ https://preview.redd.it/k9ikn9k7zpp81.png?width=568&format=png&auto=webp&s=3ec106e5ef5d54c3a8162bb45941fb3f3f9af4a2 Losses seem be mostly stable fairly quickly too (sorry, lost the graph after I accidentally shutdown my computer - I save the model every 25 epochs but not the losses graph): Epoch 255 Step 16320: Generator loss: 7.4651877045631405, critic loss: -9.97341676712036 ... Epoch 310 Step 19840: Generator loss: 7.401800179481507, critic loss: -9.479909801483156 ... Epoch 511 Step 32720: Generator loss: 11.518763446807862, critic loss: -8.105796337127687 (The 7.5 -> 11.5 GLoss looks pretty much flat on the graph. Maybe the huge initial losses puts it out of perspective?) ​ Is my GAN "stuck" or do I just need to keep training, and quality gains from that point on are going to be slower? submitted by /u/-Anordil- [link] [comments]  ( 1 min )
    [D] Guidelines on how to add skip connections to DCGAN generator?
    I'm experimenting with DCGan architecture and tried adding residual blocks to dcgan before each upscaling. The dcgan is trainiing via a WGAN-GP training procedure, and so far, I could nor really get any sensible result. Is there any guideline about how skip connections should be implemented in GANs? I'm using a very normal resnet type skip connection. The FID score keeps worsening since the start of training, and I've tweaked a lot of hyperparameters, but I still don't have see any improvements. Although I could not train that many epochs because I'm training this on colab. You can find the codes here to see the architecture: Generator and Critic. Another info: The dataset is roughly 100K book covers I've scraped from internet. submitted by /u/feryet [link] [comments]  ( 1 min )
    [D] Intersection between Computer Engineering and Machine Learning
    Hi everyone. I am pursuing Masters in Computer Engineering. However, I did my Bachelor's in Computer Science and dabbled with ML a bit. I would like to continue with it during my Masters as well. I am wondering if there is an intersection between the disciplines. If the two fields share considerable overlap, would it be possible to pursue research in ML as a CE graduate? Thanks submitted by /u/RealMatchesMalonee [link] [comments]  ( 1 min )
    Why does drug discovery with machine learning work? [D]
    Machine learning models are statistical, not causal in nature. That means they don’t necessarily hold on intervention. This should make predicting properties of drugs particularly problematic because we are not drawing further testing samples from a distribution that is similar to the training distribution, but we’re completely changing the input as we wish. Why is machine learning/deep learning successful at predicting these properties when the wider research community is struggling to make deep learning models robust, never mind causal, in general. submitted by /u/lemlo100 [link] [comments]  ( 4 min )
    [R] Text-to-Image Generation Model with 3.9B Parameters is Publicly Available
    State-of-the-art autoregressive image generation model of Kakao Brain ! The paper and codes of "Autoregressive Image Generation using Residual Quantization", which is accepted by CVPR'22, are released. Our study outperforms previous autoregressive image generation models, while increasing the sampling speed upto ~7x faster. In addition, we release RQ-Transformer with 3.9B parameters trained on 30M text-image pairs. To the best of our knowledge, it is the largest text-to-image model among public available models. ​ Examples of Generated Images in the Paper ​ Examples of Generated Images by RQ-Transformer with 3.9B parameters. The model is publicly available now ! Enjoy ! Paper: https://arxiv.org/abs/2203.01941 Code: https://github.com/kakaobrain/rq-vae-transformer submitted by /u/leedoyup [link] [comments]  ( 2 min )
    [D] Why GPU Workstations are cheaper than rack-mount Servers?
    I'm in the process of looking for a good GPU server for my university lab. We have a server room to install the new server. However, while looking into different vendors, I notice that workstations cost less than a rack-mount server even they have better specifications (I saw a gpu workstation with a 48-core 3.8 GHz processor and its cost still a few thousands less than a rack server with similar specifications and the same number and type of GPUs but with a lower grade processor with only 24-core 2.5 GHz). I was really surprised. Is there an advantage for buying a rack-mount GPU server over a GPU Workstation given it cost more for similar or lower specifications? submitted by /u/majax21 [link] [comments]  ( 2 min )
    [R][P] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer + Hugging Face Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Medium Article on Deep Learning for Tabular Data - Your thoughts on which algorithms (DeepInsight, IGTD, SuperTML, TabNet, Tab-Transformer, AutoInt, FT-Transformer ) works most times?
    I recently wrote a critical review on "Deep Learning for Tabular Data" which reviews whether we are ready to move from Tree-based models to Neural Network-based models for Tabular data. It covers many novel approaches such as DeepInsight, IGTD and SuperTML. It also includes some of the transformers based recent works such as TabNet, Tab-Transformer, AutoInt, FT-Transformer and regularisation models such MLP+. I have most commonly found the lack of a defined benchmark which makes it hard for people to find the right algorithms for the task. I am creating this discussion so that people who are using some of these algorithms or have tested some of them in different scenarios can share their findings. submitted by /u/Raghuvansh_Tahlan [link] [comments]  ( 1 min )
  • Open

    Some confusions about Actor-Critic, A2C
    In sutton's book, Actor-Critic firstly uses an approximated value function as a baseline, and uses the error to update the policy. In my opinion, the value function is used as a baseline since it assign high value to state with high expected return, and every action in this state have the same baseline. Pseudocode in sutton's book But I also see a Q-version of AC algorithm, which use Q function as the baseline. In this algorithm, the Q function is used to update the policy, and the TD error is used to update the Q function. How could we get this? Q Actor-critic Another question is about AC and A2C. Is expected return (G) minus baseline the same as the advantage function in A2C? If so, is that AC with baseline the same as A2C? submitted by /u/ZavierTi2021 [link] [comments]  ( 1 min )
    Is deep rl possible on microcontrollers?
    I am thinking of applying the deep rl on small scale robots. I have an Arduino Uno and some servos, so is it possible that deep rl can be applied using Arduino? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    A possibly stupid question about deep q-learning
    Hi Guys! I am just starting out in RL and I have a possibly stupid question about deep q-learning. Why do all of the code examples train the model on its own discounted prediction plus the reward, if they could just record all of the rewards in an episode and then calculate the total discounted rewards from the actual rewards the agent got in the episode? At least in my Implementations, the latter strategy seems to outperform the former, both in regard to the time it took the model to converge and the quality of the learned policy. submitted by /u/KayJersch [link] [comments]  ( 2 min )
    What exactly is the output of openai gym atari vram outputs?
    the docs are light and I understand they're being revamped but I can't find a definition of the outputs for ale. I understand it depends on the specific environment exactly, e.g. for non-atari envs like lunar lander it gives positional data but for the atari games docs state nothing other than its the memory dump. do I treat it like an image without the processing being necessary. can I reshape it into a matrix the size of the raw image output, and throw it into a series of convolutional layers? or do I treat it as positional data like the location of all the objects? submitted by /u/clockface99 [link] [comments]  ( 2 min )
  • Open

    Beyond Fixation: Dynamic Window Visual Transformer. (arXiv:2203.12856v1 [cs.CV])
    Recently, a surge of interest in visual transformers is to reduce the computational cost by limiting the calculation of self-attention to a local window. Most current work uses a fixed single-scale window for modeling by default, ignoring the impact of window size on model performance. However, this may limit the modeling potential of these window-based models for multi-scale information. In this paper, we propose a novel method, named Dynamic Window Vision Transformer (DW-ViT). The dynamic window strategy proposed by DW-ViT goes beyond the model that employs a fixed single window setting. To the best of our knowledge, we are the first to use dynamic multi-scale windows to explore the upper limit of the effect of window settings on model performance. In DW-ViT, multi-scale information is obtained by assigning windows of different sizes to different head groups of window multi-head self-attention. Then, the information is dynamically fused by assigning different weights to the multi-scale window branches. We conducted a detailed performance evaluation on three datasets, ImageNet-1K, ADE20K, and COCO. Compared with related state-of-the-art (SoTA) methods, DW-ViT obtains the best performance. Specifically, compared with the current SoTA Swin Transformers \cite{liu2021swin}, DW-ViT has achieved consistent and substantial improvements on all three datasets with similar parameters and computational costs. In addition, DW-ViT exhibits good scalability and can be easily inserted into any window-based visual transformers.  ( 2 min )
    Your Policy Regularizer is Secretly an Adversary. (arXiv:2203.12592v2 [cs.LG] UPDATED)
    Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbations of the reward function, which are chosen from a limited set by an imagined adversary. Using convex duality, we characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization, which includes Shannon and Tsallis entropy regularization as special cases. Importantly, generalization guarantees can be given within this robust set. We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness and its relationship with generalization. Finally, we discuss how our analysis complements and extends previous results on adversarial reward robustness and path consistency optimality conditions.  ( 2 min )
    Explainable Artificial Intelligence for Exhaust Gas Temperature of Turbofan Engines. (arXiv:2203.13108v1 [cs.LG])
    Data-driven modeling is an imperative tool in various industrial applications, including many applications in the sectors of aeronautics and commercial aviation. These models are in charge of providing key insights, such as which parameters are important on a specific measured outcome or which parameter values we should expect to observe given a set of input parameters. At the same time, however, these models rely heavily on assumptions (e.g., stationarity) or are "black box" (e.g., deep neural networks), meaning that they lack interpretability of their internal working and can be viewed only in terms of their inputs and outputs. An interpretable alternative to the "black box" models and with considerably less assumptions is symbolic regression (SR). SR searches for the optimal model structure while simultaneously optimizing the model's parameters without relying on an a-priori model structure. In this work, we apply SR on real-life exhaust gas temperature (EGT) data, collected at high frequencies through the entire flight, in order to uncover meaningful algebraic relationships between the EGT and other measurable engine parameters. The experimental results exhibit promising model accuracy, as well as explainability returning an absolute difference of 3{\deg}C compared to the ground truth and demonstrating consistency from an engineering perspective.  ( 2 min )
    Self-supervised Representation Learning for Reliable Robotic Monitoring of Fruit Anomalies. (arXiv:2109.10135v2 [cs.RO] UPDATED)
    Data augmentation can be a simple yet powerful tool for autonomous robots to fully utilise available data for selfsupervised identification of atypical scenes or objects. State-of-the-art augmentation methods arbitrarily embed "structural" peculiarity on typical images so that classifying these artefacts can provide guidance for learning representations for the detection of anomalous visual signals. In this paper, however, we argue that learning such structure-sensitive representations can be a suboptimal approach to some classes of anomaly (e.g., unhealthy fruits) which could be better recognised by a different type of visual element such as "colour". We thus propose Channel Randomisation as a novel data augmentation method for restricting neural networks to learn encoding of "colour irregularity" whilst predicting channel-randomised images to ultimately build reliable fruit-monitoring robots identifying atypical fruit qualities. Our experiments show that (1) this colour-based alternative can better learn representations for consistently accurate identification of fruit anomalies in various fruit species, and also, (2) unlike other methods, the validation accuracy can be utilised as a criterion for early stopping of training in practice due to positive correlation between the performance in the self-supervised colour-differentiation task and the subsequent detection rate of actual anomalous fruits. Also, the proposed approach is evaluated on a new agricultural dataset, Riseholme-2021, consisting of 3.5K strawberry images gathered by a mobile robot, which we share online to encourage active agri-robotics research.  ( 2 min )
    USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems. (arXiv:2107.07508v2 [cs.LG] UPDATED)
    Real-world decision-making systems are often subject to uncertainties that have to be resolved through observational data. Therefore, we are frequently confronted with combinatorial optimization problems of which the objective function is unknown and thus has to be debunked using empirical evidence. In contrast to the common practice that relies on a learning-and-optimization strategy, we consider the regression between combinatorial spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs -- without the need to learn the objective function. Our main deliverable is a universal solver that is able to handle abstract undetermined stochastic combinatorial optimization problems. For learning foundations, we present learning-error analysis under the PAC-Bayesian framework using a new margin-based analysis. In empirical studies, we demonstrate our design using proof-of-concept experiments, and compare it with other methods that are potentially applicable. Overall, we obtain highly encouraging experimental results for several classic combinatorial problems on both synthetic and real-world datasets.  ( 2 min )
    Generalized Few-Shot Semantic Segmentation: All You Need is Fine-Tuning. (arXiv:2112.10982v3 [cs.CV] UPDATED)
    Generalized few-shot semantic segmentation was introduced to move beyond only evaluating few-shot segmentation models on novel classes to include testing their ability to remember base classes. While the current state-of-the-art approach is based on meta-learning, it performs poorly and saturates in learning after observing only a few shots. We propose the first fine-tuning solution, and demonstrate that it addresses the saturation problem while achieving state-of-the-art results on two datasets, PASCAL-5i and COCO-20i. We also show that it outperforms existing methods, whether fine-tuning multiple final layers or only the final layer. Finally, we present a triplet loss regularization that shows how to redistribute the balance of performance between novel and base categories so that there is a smaller gap between them.  ( 2 min )
    DeepEverest: Accelerating Declarative Top-K Queries for Deep Neural Network Interpretation [Technical Report]. (arXiv:2104.02234v7 [cs.DB] CROSS LISTED)
    We design, implement, and evaluate DeepEverest, a system for the efficient execution of interpretation by example queries over the activation values of a deep neural network. DeepEverest consists of an efficient indexing technique and a query execution algorithm with various optimizations. We prove that the proposed query execution algorithm is instance optimal. Experiments with our prototype show that DeepEverest, using less than 20% of the storage of full materialization, significantly accelerates individual queries by up to 63x and consistently outperforms other methods on multi-query workloads that simulate DNN interpretation processes.  ( 2 min )
    Deep Portrait Delighting. (arXiv:2203.12088v2 [cs.CV] UPDATED)
    We present a deep neural network for removing undesirable shading features from an unconstrained portrait image, recovering the underlying texture. Our training scheme incorporates three regularization strategies: masked loss, to emphasize high-frequency shading features; soft-shadow loss, which improves sensitivity to subtle changes in lighting; and shading-offset estimation, to supervise separation of shading and texture. Our method demonstrates improved delighting quality and generalization when compared with the state-of-the-art. We further demonstrate how our delighting method can enhance the performance of light-sensitive computer vision tasks such as face relighting and semantic parsing, allowing them to handle extreme lighting conditions.  ( 2 min )
    A Deep-Discrete Learning Framework for Spherical Surface Registration. (arXiv:2203.12999v1 [cs.CV])
    Cortical surface registration is a fundamental tool for neuroimaging analysis that has been shown to improve the alignment of functional regions relative to volumetric approaches. Classically, image registration is performed by optimizing a complex objective similarity function, leading to long run times. This contributes to a convention for aligning all data to a global average reference frame that poorly reflects the underlying cortical heterogeneity. In this paper, we propose a novel unsupervised learning-based framework that converts registration to a multi-label classification problem, where each point in a low-resolution control grid deforms to one of fixed, finite number of endpoints. This is learned using a spherical geometric deep learning architecture, in an end-to-end unsupervised way, with regularization imposed using a deep Conditional Random Field (CRF). Experiments show that our proposed framework performs competitively, in terms of similarity and areal distortion, relative to the most popular classical surface registration algorithms and generates smoother deformations than other learning-based surface registration methods, even in subjects with atypical cortical morphology.  ( 2 min )
    Bioformers: Embedding Transformers for Ultra-Low Power sEMG-based Gesture Recognition. (arXiv:2203.12932v1 [eess.SP])
    Human-machine interaction is gaining traction in rehabilitation tasks, such as controlling prosthetic hands or robotic arms. Gesture recognition exploiting surface electromyographic (sEMG) signals is one of the most promising approaches, given that sEMG signal acquisition is non-invasive and is directly related to muscle contraction. However, the analysis of these signals still presents many challenges since similar gestures result in similar muscle contractions. Thus the resulting signal shapes are almost identical, leading to low classification accuracy. To tackle this challenge, complex neural networks are employed, which require large memory footprints, consume relatively high energy and limit the maximum battery life of devices used for classification. This work addresses this problem with the introduction of the Bioformers. This new family of ultra-small attention-based architectures approaches state-of-the-art performance while reducing the number of parameters and operations of 4.9X. Additionally, by introducing a new inter-subjects pre-training, we improve the accuracy of our best Bioformer by 3.39%, matching state-of-the-art accuracy without any additional inference cost. Deploying our best performing Bioformer on a Parallel, Ultra-Low Power (PULP) microcontroller unit (MCU), the GreenWaves GAP8, we achieve an inference latency and energy of 2.72 ms and 0.14 mJ, respectively, 8.0X lower than the previous state-of-the-art neural network, while occupying just 94.2 kB of memory.  ( 2 min )
    Position Tracking using Likelihood Modeling of Channel Features with Gaussian Processes. (arXiv:2203.13110v1 [eess.SP])
    Recent localization frameworks exploit spatial information of complex channel measurements (CMs) to estimate accurate positions even in multipath propagation scenarios. State-of-the art CM fingerprinting(FP)-based methods employ convolutional neural networks (CNN) to extract the spatial information. However, they need spatially dense data sets (associated with high acquisition and maintenance efforts) to work well -- which is rarely the case in practical applications. If such data is not available (or its quality is low), we cannot compensate the performance degradation of CNN-based FP as they do not provide statistical position estimates, which prevents a fusion with other sources of information on the observation level. We propose a novel localization framework that adapts well to sparse datasets that only contain CMs of specific areas within the environment with strong multipath propagation. Our framework compresses CMs into informative features to unravel spatial information. It then regresses Gaussian processes (GPs) for each of them, which imply statistical observation models based on distance-dependent covariance kernels. Our framework combines the trained GPs with line-of-sight ranges and a dynamics model in a particle filter. Our measurements show that our approach outperforms state-of-the-art CNN fingerprinting (0.52 m vs. 1.3 m MAE) on spatially sparse data collected in a realistic industrial indoor environment.  ( 2 min )
    VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks. (arXiv:2112.06825v2 [cs.CV] UPDATED)
    Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks. However, fine-tuning the entire parameter set of pre-trained models becomes impractical since the model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VLT5. We evaluate our methods in a unified multi-task setup on both image-text and video-text benchmarks. For the image-text tasks, we use four diverse V&L datasets: VQAv2, GQA, NLVR2 , and MSCOCO image captioning. For video-text tasks, we use TVQA, How2QA, TVC, and YC2C. With careful training and thorough experiments, we benchmark three popular adapter-based methods (Adapter, Hyperformer, Compacter) against the standard full fine-tuning and the recently proposed prompt-tuning approach. We also enhance the efficiency and performance of adapters by sharing their weights to attain knowledge across tasks. Our results demonstrate that training the adapter with the weight-sharing technique (4.18% of total parameters for image-text tasks and 3.39% for video-text tasks) can match the performance of fine-tuning the entire model. Lastly, we present a comprehensive analysis including the combination of adapter and task-specific prompts and the impact of V&L pre-training on adapters. Our code is available at: https://github.com/ylsung/VL_adapter.  ( 2 min )
    Improving Maximum Likelihood Difference Scaling method to measure inter content scale. (arXiv:2203.13186v1 [q-bio.NC])
    The goal of most subjective studies is to place a set of stimuli on a perceptual scale. This is mostly done directly by rating, e.g. using single or double stimulus methodologies, or indirectly by ranking or pairwise comparison. All these methods estimate the perceptual magnitudes of the stimuli on a scale. However, procedures such as Maximum Likelihood Difference Scaling (MLDS) have shown that considering perceptual distances can bring benefits in terms of discriminatory power, observers' cognitive load, and the number of trials required. One of the disadvantages of the MLDS method is that the perceptual scales obtained for stimuli created from different source content are generally not comparable. In this paper, we propose an extension of the MLDS method that ensures inter-content comparability of the results and shows its usefulness especially in the presence of observer errors.  ( 2 min )
    Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning. (arXiv:2203.13085v1 [cs.LG])
    Distributed training algorithms of deep neural networks show impressive convergence speedup properties on very large problems. However, they inherently suffer from communication related slowdowns and communication topology becomes a crucial design choice. Common approaches supported by most machine learning frameworks are: 1) Synchronous decentralized algorithms relying on a peer-to-peer All Reduce topology that is sensitive to stragglers and communication delays. 2) Asynchronous centralised algorithms with a server based topology that is prone to communication bottleneck. Researchers also suggested asynchronous decentralized algorithms designed to avoid the bottleneck and speedup training, however, those commonly use inexact sparse averaging that may lead to a degradation in accuracy. In this paper, we propose Local Asynchronous SGD (LASGD), an asynchronous decentralized algorithm that relies on All Reduce for model synchronization. We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset. Our experiments demonstrate that LASGD accelerates training compared to SGD and state of the art gossip based approaches.  ( 2 min )
    The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models. (arXiv:2203.13084v1 [cs.LG])
    Novel prediction methods should always be compared to a baseline to know how well they perform. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is needed to evaluate the `goodness' of a performance score. Comparing with the latest state-of-the-art model is usually insightful. However, being state-of-the-art can change rapidly when newer models are developed. Contrary to an advanced model, a simple dummy classifier could be used. However, the latter could be beaten too easily, making the comparison less valuable. This paper presents a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. We theoretically derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is: (1) general, as it is applicable to all binary classification problems; (2) simple, as it is quickly determined without training or parameter-tuning; (3) informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, to enable comparisons across research papers by this robust and universal baseline. Secondly, to provide a sanity check during the development process of a prediction model. It is a major warning sign when a model is outperformed by the DD baseline.  ( 2 min )
    Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization. (arXiv:2203.13167v1 [cs.CV])
    In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a) leads to a better overall accuracy, (b) helps enhance the rigidity by maintaining competitive performances. Furthermore, we identify the limitation imposed by the symmetric nature of regularization losses. To alleviate this, we propose an asymmetric variant and apply it to the pooled output distillation (POD) loss adapted for ViTs. Our experiments confirm that introducing asymmetry to POD boosts its plasticity while retaining stability across (a) and (b). Moreover, we acknowledge low forgetting measures for all the compared methods, indicating that ViTs might be naturally inclined continual learner  ( 2 min )
    Can Unsupervised Knowledge Transfer from Social Discussions Help Argument Mining?. (arXiv:2203.12881v1 [cs.CL])
    Identifying argument components from unstructured texts and predicting the relationships expressed among them are two primary steps of argument mining. The intrinsic complexity of these tasks demands powerful learning models. While pretrained Transformer-based Language Models (LM) have been shown to provide state-of-the-art results over different NLP tasks, the scarcity of manually annotated data and the highly domain-dependent nature of argumentation restrict the capabilities of such models. In this work, we propose a novel transfer learning strategy to overcome these challenges. We utilize argumentation-rich social discussions from the ChangeMyView subreddit as a source of unsupervised, argumentative discourse-aware knowledge by finetuning pretrained LMs on a selectively masked language modeling task. Furthermore, we introduce a novel prompt-based strategy for inter-component relation prediction that compliments our proposed finetuning method while leveraging on the discourse context. Exhaustive experiments show the generalization capability of our method on these two tasks over within-domain as well as out-of-domain datasets, outperforming several existing and employed strong baselines.  ( 2 min )
    Deep Reinforcement Learning for Demand Driven Services in Logistics and Transportation Systems: A Survey. (arXiv:2108.04462v2 [cs.LG] UPDATED)
    Recent technology development brings the booming of numerous new Demand-Driven Services (DDS) into urban lives, including ridesharing, on-demand delivery, express systems and warehousing. In DDS, a service loop is an elemental structure, including its service worker, the service providers and corresponding service targets. The service workers should transport either humans or parcels from the providers to the target locations. Various planning tasks within DDS can thus be classified into two individual stages: 1) Dispatching, which is to form service loops from demand/supply distributions, and 2)Routing, which is to decide specific serving orders within the constructed loops. Generating high-quality strategies in both stages is important to develop DDS but faces several challenging. Meanwhile, deep reinforcement learning (DRL) has been developed rapidly in recent years. It is a powerful tool to solve these problems since DRL can learn a parametric model without relying on too many problem-based assumptions and optimize long-term effect by learning sequential decisions. In this survey, we first define DDS, then highlight common applications and important decision/control problems within. For each problem, we comprehensively introduce the existing DRL solutions. We also introduce open simulation environments for development and evaluation of DDS applications. Finally, we analyze remaining challenges and discuss further research opportunities in DRL solutions for DDS.  ( 2 min )
    Identification of high order closure terms from fully kinetic simulations using machine learning. (arXiv:2110.09916v2 [physics.plasm-ph] UPDATED)
    Simulations of large-scale plasma systems are typically based on a fluid approximation approach. These models construct a moment-based system of equations that approximate the particle-based physics as a fluid, but as a result lack the small-scale physical processes available to fully kinetic models. Traditionally, empirical closure relations are used to close the moment-based system of equations, which typically approximate the pressure tensor or heat flux. The more accurate the closure relation, the stronger the simulation approaches kinetic-based results. In this paper, new closure terms are constructed using machine learning techniques. Two different machine learning models, a multi-layer perceptron and a gradient boosting regressor, synthesize a local closure relation for the pressure tensor and heat flux vector from fully kinetic simulations of a 2D magnetic reconnection problem. The models are compared to an existing closure relation for the pressure tensor, and the applicability of the models is discussed. The initial results show that the models can capture the diagonal components of the pressure tensor accurately, and show promising results for the heat flux, opening the way for new experiments in multi-scale modeling. We find that the sampling of the points used to train both models play a capital role in their accuracy.  ( 2 min )
    Data Smells in Public Datasets. (arXiv:2203.08007v2 [cs.SE] UPDATED)
    The adoption of Artificial Intelligence (AI) in high-stakes domains such as healthcare, wildlife preservation, autonomous driving and criminal justice system calls for a data-centric approach to AI. Data scientists spend the majority of their time studying and wrangling the data, yet tools to aid them with data analysis are lacking. This study identifies the recurrent data quality issues in public datasets. Analogous to code smells, we introduce a novel catalogue of data smells that can be used to indicate early signs of problems or technical debt in machine learning systems. To understand the prevalence of data quality issues in datasets, we analyse 25 public datasets and identify 14 data smells.  ( 2 min )
    Improving Generalization in Federated Learning by Seeking Flat Minima. (arXiv:2203.11834v2 [cs.LG] UPDATED)
    Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. CIFAR10/100, Landmarks-User-160k, IDDA) and tasks (large scale classification, semantic segmentation, domain generalization).  ( 2 min )
    Multilingual CheckList: Generation and Evaluation. (arXiv:2203.12865v1 [cs.CL])
    The recently proposed CheckList (Riberio et al,. 2020) approach to evaluation of NLP systems has revealed high failure rates for basic capabilities for multiple state-of-the-art and commercial models. However, the CheckList creation process is manual which creates a bottleneck towards creation of multilingual CheckLists catering 100s of languages. In this work, we explore multiple approaches to generate and evaluate the quality of Multilingual CheckList. We device an algorithm -- Automated Multilingual Checklist Generation (AMCG) for automatically transferring a CheckList from a source to a target language that relies on a reasonable machine translation system. We then compare the CheckList generated by AMCG with CheckLists generated with different levels of human intervention. Through in-depth crosslingual experiments between English and Hindi, and broad multilingual experiments spanning 11 languages, we show that the automatic approach can provide accurate estimates of failure rates of a model across capabilities, as would a human-verified CheckList, and better than CheckLists generated by humans from scratch.  ( 2 min )
    GradViT: Gradient Inversion of Vision Transformers. (arXiv:2203.11894v2 [cs.CV] UPDATED)
    In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. During this attack, the original data batch is reconstructed given model weights and the corresponding gradients. We introduce a method, named GradViT, that optimizes random noise into naturally looking images via an iterative process. The optimization objective consists of (i) a loss on matching the gradients, (ii) image prior in the form of distance to batch-normalization statistics of a pretrained CNN model, and (iii) a total variation regularization on patches to guide correct recovery locations. We propose a unique loss scheduling function to overcome local minima during optimization. We evaluate GadViT on ImageNet1K and MS-Celeb-1M datasets, and observe unprecedentedly high fidelity and closeness to the original (hidden) data. During the analysis we find that vision transformers are significantly more vulnerable than previously studied CNNs due to the presence of the attention mechanism. Our method demonstrates new state-of-the-art results for gradient inversion in both qualitative and quantitative metrics. Project page at https://gradvit.github.io/.  ( 2 min )
    Using Orientation to Distinguish Overlapping Chromosomes. (arXiv:2203.13004v1 [cs.LG])
    A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should be classed as #1 and #2. Assigning class labels based on comparison rules, such as the shorter/longer chromosome alleviates, but does not fully resolve the issue. Instead, we separate the chromosome instances in a second stage, predicting the orientation of the chromosomes by the model and use it as one of the key distinguishing factors of the chromosomes. We demonstrate this method to be effective. Furthermore, we introduce a novel Double-Angle representation that a neural network can use to predict the orientation. The representation maps any direction and its reverse to the same point. Lastly, we present a new expanded synthetic dataset, which is based on Pommier's dataset, but addresses its issues with insufficient separation between its training and testing sets.  ( 2 min )
    Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition. (arXiv:2203.12907v1 [cs.CL])
    Named entity recognition (NER) is the process of recognising and classifying important information (entities) in text. Proper nouns, such as a person's name, an organization's name, or a location's name, are examples of entities. The NER is one of the important modules in applications like human resources, customer support, search engines, content classification, and academia. In this work, we consider NER for low-resource Indian languages like Hindi and Marathi. The transformer-based models have been widely used for NER tasks. We consider different variations of BERT like base-BERT, RoBERTa, and AlBERT and benchmark them on publicly available Hindi and Marathi NER datasets. We provide an exhaustive comparison of different monolingual and multilingual transformer-based models and establish simple baselines currently missing in the literature. We show that the monolingual MahaRoBERTa model performs the best for Marathi NER whereas the multilingual XLM-RoBERTa performs the best for Hindi NER. We also perform cross-language evaluation and present mixed observations.  ( 2 min )
    Transformer Compressed Sensing via Global Image Tokens. (arXiv:2203.12861v1 [cs.CV])
    Convolutional neural networks (CNN) have demonstrated outstanding Compressed Sensing (CS) performance compared to traditional, hand-crafted methods. However, they are broadly limited in terms of generalisability, inductive bias and difficulty to model long distance relationships. Transformer neural networks (TNN) overcome such issues by implementing an attention mechanism designed to capture dependencies between inputs. However, high-resolution tasks typically require vision Transformers (ViT) to decompose an image into patch-based tokens, limiting inputs to inherently local contexts. We propose a novel image decomposition that naturally embeds images into low-resolution inputs. These Kaleidoscope tokens (KD) provide a mechanism for global attention, at the same computational cost as a patch-based approach. To showcase this development, we replace CNN components in a well-known CS-MRI neural network with TNN blocks and demonstrate the improvements afforded by KD. We also propose an ensemble of image tokens, which enhance overall image quality and reduces model size. Supplementary material is available: https://github.com/uqmarlonbran/TCS.git}{https://github.com/uqmarlonbran/TCS.git  ( 2 min )
    Personalized incentives as feedback design in generalized Nash equilibrium problems. (arXiv:2203.12948v1 [math.OC])
    We investigate both stationary and time-varying, nonmonotone generalized Nash equilibrium problems that exhibit symmetric interactions among the agents, which are known to be potential. As may happen in practical cases, however, we envision a scenario in which the formal expression of the underlying potential function is not available, and we design a semi-decentralized Nash equilibrium seeking algorithm. In the proposed two-layer scheme, a coordinator iteratively integrates the (possibly noisy and sporadic) agents' feedback to learn the pseudo-gradients of the agents, and then design personalized incentives for them. On their side, the agents receive those personalized incentives, compute a solution to an extended game, and then return feedback measurements to the coordinator. In the stationary setting, our algorithm returns a Nash equilibrium in case the coordinator is endowed with standard learning policies, while it returns a Nash equilibrium up to a constant, yet adjustable, error in the time-varying case. As a motivating application, we consider the ridehailing service provided by several companies with mobility as a service orchestration, necessary to both handle competition among firms and avoid traffic congestion, which is also adopted to run numerical experiments verifying our results.  ( 2 min )
    Distributionally Robust Optimization via Ball Oracle Acceleration. (arXiv:2203.13225v1 [math.OC])
    We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses. In particular, we consider group-structured and bounded $f$-divergence uncertainty sets. Our approach relies on an accelerated method that queries a ball optimization oracle, i.e., a subroutine that minimizes the objective within a small ball around the query point. Our main contribution is efficient implementations of this oracle for DRO objectives. For DRO with $N$ non-smooth loss functions, the resulting algorithms find an $\epsilon$-accurate solution with $\widetilde{O}\left(N\epsilon^{-2/3} + \epsilon^{-2}\right)$ first-order oracle queries to individual loss functions. Compared to existing algorithms for this problem, we improve complexity by a factor of up to $\epsilon^{-4/3}$.  ( 2 min )
    Development of a Vertex Finding Algorithm using Recurrent Neural Network. (arXiv:2101.11906v4 [physics.data-an] UPDATED)
    Deep learning is a rapidly-evolving technology with possibility to significantly improve physics reach of collider experiments. In this study we developed a novel algorithm of vertex finding for future lepton colliders such as the International Linear Collider. We deploy two networks; one is simple fully-connected layers to look for vertex seeds from track pairs, and the other is a customized Recurrent Neural Network with an attention mechanism and an encoder-decoder structure to associate tracks to the vertex seeds. The performance of the vertex finder is compared with the standard ILC reconstruction algorithm.
    Fast Sparse Decision Tree Optimization via Reference Ensembles. (arXiv:2112.00798v3 [cs.LG] UPDATED)
    Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at the core of interpretable machine learning. Sparse decision tree optimization is computationally hard, and despite steady effort since the 1960's, breakthroughs have only been made on the problem within the past few years, primarily on the problem of finding optimal sparse decision trees. However, current state-of-the-art algorithms often require impractical amounts of computation time and memory to find optimal or near-optimal trees for some real-world datasets, particularly those having several continuous-valued features. Given that the search spaces of these decision tree optimization problems are massive, can we practically hope to find a sparse decision tree that competes in accuracy with a black box machine learning model? We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. We show that by using these guesses, we can reduce the run time by multiple orders of magnitude, while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressive power. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree. Our experiments show that in many cases we can rapidly construct sparse decision trees that match the accuracy of black box models. To summarize: when you are having trouble optimizing, just guess.  ( 2 min )
    Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors. (arXiv:2203.13131v1 [cs.CV])
    Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains. While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels, significantly improving visual quality. Through scene controllability, we introduce several new capabilities: (i) Scene editing, (ii) text editing with anchor scenes, (iii) overcoming out-of-distribution text prompts, and (iv) story illustration generation, as demonstrated in the story we wrote.
    FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning. (arXiv:2103.13822v3 [cs.LG] UPDATED)
    Client-wise data heterogeneity is one of the major issues that hinder effective training in federated learning (FL). Since the data distribution on each client may vary dramatically, the client selection strategy can significantly influence the convergence rate of the FL process. Active client selection strategies are popularly proposed in recent studies. However, they neglect the loss correlations between the clients and achieve only marginal improvement compared to the uniform selection strategy. In this work, we propose FedCor -- an FL framework built on a correlation-based client selection strategy, to boost the convergence rate of FL. Specifically, we first model the loss correlations between the clients with a Gaussian Process (GP). Based on the GP model, we derive a client selection strategy with a significant reduction of expected global loss in each round. Besides, we develop an efficient GP training method with a low communication overhead in the FL scenario by utilizing the covariance stationarity. Our experimental results show that compared to the state-of-the-art method, FedCorr can improve the convergence rates by $34\%\sim 99\%$ and $26\%\sim 51\%$ on FMNIST and CIFAR-10, respectively.
    On the Implicit Bias Towards Minimal Depth of Deep Neural Networks. (arXiv:2202.09028v3 [cs.LG] UPDATED)
    We study the implicit bias of gradient based training methods to favor low-depth solutions when training deep neural networks. Recent results in the literature suggest that penultimate layer representations learned by a classifier over multiple classes exhibit a clustering property, called neural collapse. We demonstrate empirically that the neural collapse property extends beyond the penultimate layer and tends to emerge in intermediate layers as well. In this regards, we hypothesize that gradient based methods are implicitly biased towards selecting neural networks of minimal depth for achieving this clustering property.
    LHNN: Lattice Hypergraph Neural Network for VLSI Congestion Prediction. (arXiv:2203.12831v1 [cs.LG])
    Precise congestion prediction from a placement solution plays a crucial role in circuit placement. This work proposes the lattice hypergraph (LH-graph), a novel graph formulation for circuits, which preserves netlist data during the whole learning process, and enables the congestion information propagated geometrically and topologically. Based on the formulation, we further developed a heterogeneous graph neural network architecture LHNN, jointing the routing demand regression to support the congestion spot classification. LHNN constantly achieves more than 35% improvements compared with U-nets and Pix2Pix on the F1 score. We expect our work shall highlight essential procedures using machine learning for congestion prediction.
    gACSON software for automated segmentation and morphology analyses of myelinated axons in 3D electron microscopy. (arXiv:2112.06476v2 [eess.IV] UPDATED)
    Background and Objective: Advances in electron microscopy (EM) now allow three-dimensional (3D) imaging of hundreds of micrometers of tissue with nanometer-scale resolution, providing new opportunities to study the ultrastructure of the brain. In this work, we introduce a freely available Matlab-based gACSON software for visualization, segmentation, assessment, and morphology analysis of myelinated axons in 3D-EM volumes of brain tissue samples. Methods: The software is equipped with a graphical user interface (GUI). It automatically segments the intra-axonal space of myelinated axons and their corresponding myelin sheaths and allows manual segmentation, proofreading, and interactive correction of the segmented components. gACSON analyzes the morphology of myelinated axons, such as axonal diameter, axonal eccentricity, myelin thickness, or g-ratio. Results: We illustrate the use of the software by segmenting and analyzing myelinated axons in six 3D-EM volumes of rat somatosensory cortex after sham surgery or traumatic brain injury (TBI). Our results suggest that the equivalent diameter of myelinated axons in somatosensory cortex was decreased in TBI animals five months after the injury. Conclusions: Our results indicate that gACSON is a valuable tool for visualization, segmentation, assessment, and morphology analysis of myelinated axons in 3D-EM volumes. It is freely available at https://github.com/AndreaBehan/g-ACSON under the MIT license.
    Supervised Training of Siamese Spiking Neural Networks with Earth's Mover Distance. (arXiv:2203.13207v1 [cs.NE])
    This study adapts the highly-versatile siamese neural network model to the event data domain. We introduce a supervised training framework for optimizing Earth's Mover Distance (EMD) between spike trains with spiking neural networks (SNN). We train this model on images of the MNIST dataset converted into spiking domain with novel conversion schemes. The quality of the siamese embeddings of input images was evaluated by measuring the classifier performance for different dataset coding types. The models achieved performance similar to existing SNN-based approaches (F1-score of up to 0.9386) while using only about 15% of hidden layer neurons to classify each example. Furthermore, models which did not employ a sparse neural code were about 45% slower than their sparse counterparts. These properties make the model suitable for low energy consumption and low prediction latency applications.
    High pressure hydrogen by machine learning and quantum Monte Carlo. (arXiv:2112.11099v2 [cond-mat.str-el] UPDATED)
    We have developed a technique combining the accuracy of quantum Monte Carlo in describing the electron correlation with the efficiency of a Machine Learning Potential (MLP). We use kernel regression in combination with SOAP (Smooth Overlap of Atomic Position) features, implemented here in a very efficient way. The key ingredients are: i) a sparsification technique, based on farthest point sampling, ensuring generality and transferability of our MLPs and ii) the so called $\Delta$-learning, allowing a small training data set, a fundamental property for highly accurate but computationally demanding calculations, such as the ones based on quantum Monte Carlo. As the first application we present a benchmark study of the liquid-liquid transition of high-pressure hydrogen and show the quality of our MLP, by emphasizing the importance of high accuracy for this very debated subject, where experiments are difficult in the lab, and theory is still far from being conclusive.
    Addressing Missing Sources with Adversarial Support-Matching. (arXiv:2203.13154v1 [stat.ML])
    When trained on diverse labeled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection and curation, the labeled training set might exhibit a systematic shortage of data for certain groups. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"; we refer to combinations of subgroups and classes, or leaves of the hierarchy, as "sources". To characterize the problem, we introduce the concept of classes with incomplete subgroup support. The representational bias in the training set can give rise to spurious correlations between the classes and the subgroups which render standard classification models ungeneralizable to unseen sources. To overcome this bias, we make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup. This is done by adversarially matching the support of the training and deployment sets in representation space. In order to learn the desired invariance, it is paramount that the sets of samples observed by the discriminator are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method with experiments on several datasets and variants of the problem.
    ErfAct and Pserf: Non-monotonic Smooth Trainable Activation Functions. (arXiv:2109.04386v4 [cs.NE] UPDATED)
    An activation function is a crucial component of a neural network that introduces non-linearity in the network. The state-of-the-art performance of a neural network depends also on the perfect choice of an activation function. We propose two novel non-monotonic smooth trainable activation functions, called ErfAct and Pserf. Experiments suggest that the proposed functions improve the network performance significantly compared to the widely used activations like ReLU, Swish, and Mish. Replacing ReLU by ErfAct and Pserf, we have 5.68% and 5.42% improvement for top-1 accuracy on Shufflenet V2 (2.0x) network in CIFAR100 dataset, 2.11% and 1.96% improvement for top-1 accuracy on Shufflenet V2 (2.0x) network in CIFAR10 dataset, 1.0%, and 1.0% improvement on mean average precision (mAP) on SSD300 model in Pascal VOC dataset.  ( 2 min )
    Representation of binary classification trees with binary features by quantum circuits. (arXiv:2108.13207v2 [quant-ph] UPDATED)
    We propose a quantum representation of binary classification trees with binary features based on a probabilistic approach. By using the quantum computer as a processor for probability distributions, a probabilistic traversal of the decision tree can be realized via measurements of a quantum circuit. We describe how tree inductions and the prediction of class labels of query data can be integrated into this framework. An on-demand sampling method enables predictions with a constant number of classical memory slots, independent of the tree depth. We experimentally study our approach using both a quantum computing simulator and actual IBM quantum hardware. To our knowledge, this is the first realization of a decision tree classifier on a quantum device.
    Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking. (arXiv:2203.13151v1 [cs.CL])
    Transformer-based language models (TLMs) provide state-of-the-art performance in many modern natural language processing applications. TLM training is conducted in two phases. First, the model is pre-trained over large volumes of text to minimize a generic objective function, such as the Masked Language Model (MLM). Second, the model is fine-tuned in specific downstream tasks. Pre-training requires large volumes of data and high computational resources, while introducing many still unresolved design choices. For instance, selecting hyperparameters for language model pre-training is often carried out based on heuristics or grid-based searches. In this work, we propose a multi-armed bandit-based online optimization framework for the sequential selection of pre-training hyperparameters to optimize language model performance. We pose the pre-training procedure as a sequential decision-making task, where at each pre-training step, an agent must determine what hyperparameters to use towards optimizing the pre-training objective. We propose a Thompson sampling bandit algorithm, based on a surrogate Gaussian process reward model of the MLM pre-training objective, for its sequential minimization. We empirically show how the proposed Gaussian process based Thompson sampling pre-trains robust and well-performing language models. Namely, by sequentially selecting masking hyperparameters of the TLM, we achieve satisfactory performance in less epochs, not only in terms of the pre-training MLM objective, but in diverse downstream fine-tuning tasks. The proposed bandit-based technique provides an automated hyperparameter selection method for pre-training TLMs of interest to practitioners. In addition, our results indicate that, instead of MLM pre-training with fixed masking probabilities, sequentially adapting the masking hyperparameters improves both pre-training loss and downstream task metrics.
    Out-of-distribution Generalization with Causal Invariant Transformations. (arXiv:2203.11528v3 [stat.ML] UPDATED)
    In real-world applications, it is important and desirable to learn a model that performs well on out-of-distribution (OOD) data. Recently, causality has become a powerful tool to tackle the OOD generalization problem, with the idea resting on the causal mechanism that is invariant across domains of interest. To leverage the generally unknown causal mechanism, existing works assume a linear form of causal feature or require sufficiently many and diverse training domains, which are usually restrictive in practice. In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature. Our approach is based on transformations that modify the non-causal feature but leave the causal part unchanged, which can be either obtained from prior knowledge or learned from the training data in the multi-domain scenario. Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model across the domains using only single domain data. Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations. Based on the theoretical findings, a regularized training procedure is proposed to improve the OOD generalization capability. Extensive experimental results on both synthetic and real datasets verify the effectiveness of the proposed algorithm, even with only a few causal invariant transformations.
    Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation. (arXiv:2006.08781v3 [cs.LG] UPDATED)
    Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
    Spoofing Generalization: When Can't You Trust Proprietary Models?. (arXiv:2106.08393v2 [cs.LG] UPDATED)
    In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent difficulties of blindly trusting large proprietary models or data.
    Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation. (arXiv:2203.13251v1 [cs.RO])
    Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexterous manipulation is quite challenging. The complex, high-dimensional action-space involved with multi-finger control often leads to poor sample efficiency of learning-based methods. In this work, we propose 'Dexterous Imitation Made Easy' (DIME) a new imitation learning framework for dexterous manipulation. DIME only requires a single RGB camera to observe a human operator and teleoperate our robotic hand. Once demonstrations are collected, DIME employs standard imitation learning methods to train dexterous manipulation policies. On both simulation and real robot benchmarks we demonstrate that DIME can be used to solve complex, in-hand manipulation tasks such as 'flipping', 'spinning', and 'rotating' objects with the Allegro hand. Our framework along with pre-collected demonstrations is publicly available at https://nyu-robot-learning.github.io/dime.
    DPST: De Novo Peptide Sequencing with Amino-Acid-Aware Transformers. (arXiv:2203.13132v1 [q-bio.QM])
    De novo peptide sequencing aims to recover amino acid sequences of a peptide from tandem mass spectrometry (MS) data. Existing approaches for de novo analysis enumerate MS evidence for all amino acid classes during inference. It leads to over-trimming on receptive fields of MS data and restricts MS evidence associated with following undecoded amino acids. Our approach, DPST, circumvents these limitations with two key components: (1) A confidence value aggregation encoder to sketch spectrum representations according to amino-acid-based connectivity among MS; (2) A global-local fusion decoder to progressively assimilate contextualized spectrum representations with a predefined preconception of localized MS evidence and amino acid priors. Our components originate from a closed-form solution and selectively attend to informative amino-acid-aware MS representations. Through extensive empirical studies, we demonstrate the superiority of DPST, showing that it outperforms state-of-the-art approaches by a margin of 12% - 19% peptide accuracy.
    Rubik's Cube Operator: A Plug And Play Permutation Module for Better Arranging High Dimensional Industrial Data in Deep Convolutional Processes. (arXiv:2203.12921v1 [cs.LG])
    The convolutional neural network (CNN) has been widely applied to process the industrial data based tensor input, which integrates data records of distributed industrial systems from the spatial, temporal, and system dynamics aspects. However, unlike images, information in the industrial data based tensor is not necessarily spatially ordered. Thus, directly applying CNN is ineffective. To tackle such issue, we propose a plug and play module, the Rubik's Cube Operator (RCO), to adaptively permutate the data organization of the industrial data based tensor to an optimal or suboptimal order of attributes before being processed by CNNs, which can be updated with subsequent CNNs together via the gradient-based optimizer. The proposed RCO maintains K binary and right stochastic permutation matrices to permutate attributes of K axes of the input industrial data based tensor. A novel learning process is proposed to enable learning permutation matrices from data, where the Gumbel-Softmax is employed to reparameterize elements of permutation matrices, and the soft regularization loss is proposed and added to the task-specific loss to ensure the feature diversity of the permuted data. We verify the effectiveness of the proposed RCO via considering two representative learning tasks processing industrial data via CNNs, the wind power prediction (WPP) and the wind speed prediction (WSP) from the renewable energy domain. Computational experiments are conducted based on four datasets collected from different wind farms and the results demonstrate that the proposed RCO can improve the performance of CNN based networks significantly.
    A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces. (arXiv:2007.05078v2 [cs.LG] UPDATED)
    In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.
    Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies. (arXiv:2203.12922v1 [cs.LG])
    This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound \emph{independent on the planning horizon}. Specifically, we consider tabular MDP with $S$ states, $A$ actions, a planning horizon $H$, total reward bounded by $1$, and the agent plays for $K$ episodes. We design an algorithm that achieves an $O\left(\mathrm{poly}(S,A,\log K)\sqrt{K}\right)$ regret in contrast to existing bounds which either has an additional $\mathrm{polylog}(H)$ dependency~\citep{zhang2020reinforcement} or has an exponential dependency on $S$~\citep{li2021settling}. Our result relies on a sequence of new structural lemmas establishing the approximation power, stability, and concentration property of stationary policies, which can have applications in other problems related to Markov chains.
    LAFITE: Towards Language-Free Training for Text-to-Image Generation. (arXiv:2111.13792v3 [cs.CV] UPDATED)
    One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs. While image samples are often easily accessible, the associated text descriptions typically require careful human captioning, which is particularly time- and cost-consuming. In this paper, we propose the first work to train text-to-image generation models without any text data. Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model: the requirement of text-conditioning is seamlessly alleviated via generating text features from image features. Extensive experiments are conducted to illustrate the effectiveness of the proposed method. We obtain state-of-the-art results in the standard text-to-image generation tasks. Importantly, the proposed language-free model outperforms most existing models trained with full image-text pairs. Furthermore, our method can be applied in fine-tuning pre-trained models, which saves both training time and cost in training text-to-image generation models. Our pre-trained model obtains competitive results in zero-shot text-to-image generation on the MS-COCO dataset, yet with around only 1% of the model size and training data size relative to the recently proposed large DALL-E model.
    Avalanche RL: a Continual Reinforcement Learning Library. (arXiv:2202.13657v2 [cs.LG] UPDATED)
    Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch and supports any OpenAI Gym environment. Its design is based on Avalanche, one of the more popular continual learning libraries, which allow us to reuse a large number of continual learning strategies and improve the interaction between reinforcement learning and continual learning researchers. Additionally, we propose Continual Habitat-Lab, a novel benchmark and a high-level library which enables the usage of the photorealistic simulator Habitat-Sim for CRL research. Overall, Avalanche RL attempts to unify under a common framework continual reinforcement learning applications, which we hope will foster the growth of the field.
    Effective Explanations for Entity Resolution Models. (arXiv:2203.12978v1 [cs.DB])
    Entity resolution (ER) aims at matching records that refer to the same real-world entity. Although widely studied for the last 50 years, ER still represents a challenging data management problem, and several recent works have started to investigate the opportunity of applying deep learning (DL) techniques to solve this problem. In this paper, we study the fundamental problem of explainability of the DL solution for ER. Understanding the matching predictions of an ER solution is indeed crucial to assess the trustworthiness of the DL model and to discover its biases. We treat the DL model as a black box classifier and - while previous approaches to provide explanations for DL predictions are agnostic to the classification task. we propose the CERTA approach that is aware of the semantics of the ER problem. Our approach produces both saliency explanations, which associate each attribute with a saliency score, and counterfactual explanations, which provide examples of values that can flip the prediction. CERTA builds on a probabilistic framework that aims at computing the explanations evaluating the outcomes produced by using perturbed copies of the input records. We experimentally evaluate CERTA's explanations of state-of-the-art ER solutions based on DL models using publicly available datasets, and demonstrate the effectiveness of CERTA over recently proposed methods for this problem.
    Learning Optimal Strategies for Temporal Tasks in Stochastic Games. (arXiv:2102.04307v2 [cs.AI] UPDATED)
    Synthesis from linear temporal logic (LTL) specifications provides assured controllers for autonomous systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach that derives controllers from given LTL specifications even when the environment is completely unknown. We model the problem of satisfying the LTL specifications as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal controller strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this allows for the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of colors defining the acceptance condition of the DPA is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.
    SwiftAgg+: Achieving Asymptotically Optimal Communication Load in Secure Aggregation for Federated Learning. (arXiv:2203.13060v1 [cs.IT])
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N\in\mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve the optimum communication load within a diminishing gap. Specifically, in presence of at most $D$ dropout users, SwiftAgg+ achieves average per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ and the server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The proposed SwiftAgg+ has also a flexibility to reduce the number of active communication links at the cost of increasing the the communication load between the users and the server. In particular, for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the uplink communication load of $(1+\frac{T}{K})L$, and per-user communication load of up to $(1-\frac{1}{N})(1+\frac{T+D}{K})L$, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.
    Algorithm Fairness in AI for Medicine and Healthcare. (arXiv:2110.00603v2 [cs.CV] UPDATED)
    In the current development and deployment of many artificial intelligence (AI) systems in healthcare, algorithm fairness is a challenging problem in delivering equitable care. Recent evaluation of AI models stratified across race sub-populations have revealed inequalities in how patients are diagnosed, given treatments, and billed for healthcare costs. In this perspective article, we summarize the intersectional field of fairness in machine learning through the context of current issues in healthcare, outline how algorithmic biases (e.g. - image acquisition, genetic variation, intra-observer labeling variability) arise in current clinical workflows and their resulting healthcare disparities. Lastly, we also review emerging technology for mitigating bias via federated learning, disentanglement, and model explainability, and their role in AI-SaMD development.
    Online Enhanced Semantic Hashing: Towards Effective and Efficient Retrieval for Streaming Multi-Modal Data. (arXiv:2109.04260v2 [cs.MM] UPDATED)
    With the vigorous development of multimedia equipment and applications, efficient retrieval of large-scale multi-modal data has become a trendy research topic. Thereinto, hashing has become a prevalent choice due to its retrieval efficiency and low storage cost. Although multi-modal hashing has drawn lots of attention in recent years, there still remain some problems. The first point is that existing methods are mainly designed in batch mode and not able to efficiently handle streaming multi-modal data. The second point is that all existing online multi-modal hashing methods fail to effectively handle unseen new classes which come continuously with streaming data chunks. In this paper, we propose a new model, termed Online enhAnced SemantIc haShing (OASIS). We design novel semantic-enhanced representation for data, which could help handle the new coming classes, and thereby construct the enhanced semantic objective function. An efficient and effective discrete online optimization algorithm is further proposed for OASIS. Extensive experiments show that our method can exceed the state-of-the-art models. For good reproducibility and benefiting the community, our code and data are already available in supplementary material and will be made publicly available.
    TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference. (arXiv:2203.12925v1 [cs.LG])
    Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to 103X lower latency and 20.3X lower energy than the Cube-AI toolkit executed on the STM32L4 and from 2.9X to 26.6X lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.
    Decouple-and-Sample: Protecting sensitive information in task agnostic data release. (arXiv:2203.13204v1 [cs.CR])
    We propose sanitizer, a framework for secure and task-agnostic data release. While releasing datasets continues to make a big impact in various applications of computer vision, its impact is mostly realized when data sharing is not inhibited by privacy concerns. We alleviate these concerns by sanitizing datasets in a two-stage process. First, we introduce a global decoupling stage for decomposing raw data into sensitive and non-sensitive latent representations. Secondly, we design a local sampling stage to synthetically generate sensitive information with differential privacy and merge it with non-sensitive latent features to create a useful representation while preserving the privacy. This newly formed latent information is a task-agnostic representation of the original dataset with anonymized sensitive information. While most algorithms sanitize data in a task-dependent manner, a few task-agnostic sanitization techniques sanitize data by censoring sensitive information. In this work, we show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately. We validate the effectiveness of the sanitizer by outperforming state-of-the-art baselines on the existing benchmark tasks and demonstrating tasks that are not possible using existing techniques.
    Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance. (arXiv:2203.08368v2 [cs.LG] UPDATED)
    The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely and correctly provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 seconds. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate).
    Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer. (arXiv:2203.13248v1 [cs.CV])
    Recent studies on StyleGAN show high performance on artistic portrait generation by transfer learning with limited data. In this paper, we explore more challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain. Different from StyleGAN, DualStyleGAN provides a natural way of style transfer by characterizing the content and style of a portrait with an intrinsic style path and a new extrinsic style path, respectively. The delicately designed extrinsic style path enables our model to modulate both the color and complex structural styles hierarchically to precisely pastiche the style example. Furthermore, a novel progressive fine-tuning scheme is introduced to smoothly transform the generative space of the model to the target domain, even with the above modifications on the network architecture. Experiments demonstrate the superiority of DualStyleGAN over state-of-the-art methods in high-quality portrait style transfer and flexible style control.
    On the Implicit Bias of Gradient Descent for Temporal Extrapolation. (arXiv:2202.04302v2 [cs.LG] UPDATED)
    When using recurrent neural networks (RNNs) it is common practice to apply trained models to sequences longer than those seen in training. This "extrapolating" usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple case where the data generating distribution is memoryless. We first show that even with infinite training data, there exist RNN models that interpolate perfectly (i.e., they fit the training data) yet extrapolate poorly to longer sequences. We then show that if gradient descent is used for training, learning will converge to perfect extrapolation under certain assumptions on initialization. Our results complement recent studies on the implicit bias of gradient descent, showing that it plays a key role in extrapolation when learning temporal prediction models.
    Kullback-Leibler control for discrete-time nonlinear systems on continuous spaces. (arXiv:2203.12864v1 [eess.SY])
    Kullback-Leibler (KL) control enables efficient numerical methods for nonlinear optimal control problems. The crucial assumption of KL control is the full controllability of the transition distribution. However, this assumption is often violated when the dynamics evolves in a continuous space. Consequently, applying KL control to problems with continuous spaces requires some approximation, which leads to the lost of the optimality. To avoid such approximation, in this paper, we reformulate the KL control problem for continuous spaces so that it does not require unrealistic assumptions. The key difference between the original and reformulated KL control is that the former measures the control effort by KL divergence between controlled and uncontrolled transition distributions while the latter replaces the uncontrolled transition by a noise-driven transition. We show that the reformulated KL control admits efficient numerical algorithms like the original one without unreasonable assumptions. Specifically, the associated value function can be computed by using a Monte Carlo method based on its path integral representation.
    Visual Microfossil Identification via Deep Metric Learning. (arXiv:2112.09490v3 [cs.CV] UPDATED)
    We apply deep metric learning for the first time to the problem of classifying planktic foraminifer shells on microscopic images. This species recognition task is an important information source and scientific pillar for reconstructing past climates. All foraminifer CNN recognition pipelines in the literature produce black-box classifiers that lack visualization options for human experts and cannot be applied to open-set problems. Here, we benchmark metric learning against these pipelines, produce the first scientific visualization of the phenotypic planktic foraminifer morphology space, and demonstrate that metric learning can be used to cluster species unseen during training. We show that metric learning outperforms all published CNN-based state-of-the-art benchmarks in this domain. We evaluate our approach on the 34,640 expert-annotated images of the Endless Forams public library of 35 modern planktic foraminifera species. Our results on this data show leading 92% accuracy (at 0.84 F1-score) in reproducing expert labels on withheld test data, and 66.5% accuracy (at 0.70 F1-score) when clustering species never encountered in training. We conclude that metric learning is highly effective for this domain and serves as an important tool towards expert-in-the-loop automation of microfossil identification. Keycode, network weights, and data splits are published with this paper for full reproducibility.
    Extended critical regimes of deep neural networks. (arXiv:2203.12967v1 [cs.LG])
    Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing the contraction as well as expansion of internal neural representations and speeding up training processes, hence providing a theoretical guide for the design of efficient neural architectures.
    HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement. (arXiv:2203.13086v1 [cs.SD])
    Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for neural vocoding, bandwidth extension, and speech enhancement. We show that with the improved generator architecture and simplified multi-discriminator training, HiFi++ performs on par with the state-of-the-art in these tasks while spending significantly less memory and computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
    Token Dropping for Efficient BERT Pretraining. (arXiv:2203.13240v1 [cs.CL])
    Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In short, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens; the dropped tokens are later picked up by the last layer of the model so that the model still produces full-length sequences. We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead. In our experiments, this simple approach reduces the pretraining cost of BERT by 25% while achieving similar overall fine-tuning performance on standard downstream tasks.
    Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing. (arXiv:2203.12738v1 [cs.LG])
    Federated learning is a prime candidate for distributed machine learning at the network edge due to the low communication complexity and privacy protection among other attractive properties. However, existing algorithms face issues with slow convergence and/or robustness of performance due to the considerable heterogeneity of data distribution, computation and communication capability at the edge. In this work, we tackle both of these issues by focusing on the key component of model aggregation in federated learning systems and studying optimal algorithms to perform this task. Particularly, we propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction in each round of optimization. The aforementioned context-dependent bound is derived from the particular participating devices in that round and an assumption on smoothness of the overall loss function. We show that this aggregation leads to a definite reduction of loss function at every round. Furthermore, we can integrate our aggregation with many existing algorithms to obtain the contextual versions. Our experimental results demonstrate significant improvements in convergence speed and robustness of the contextual versions compared to the original algorithms. We also consider different variants of the contextual aggregation and show robust performance even in the most extreme settings.
    A Supervised Machine Learning Approach for Sequence Based Protein-protein Interaction (PPI) Prediction. (arXiv:2203.12659v1 [cs.LG])
    Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions compared to experimental approaches. Sequence is one of the key and primary information of proteins that plays a crucial role in PPI prediction. Several machine learning approaches have been applied to exploit the characteristics of PPI datasets. However, these datasets greatly influence the performance of predicting models. So, care should be taken on both dataset curation as well as design of predictive models. Here, we have described our submitted solution with the results of the SeqPIP competition whose objective was to develop comprehensive PPI predictive models from sequence information with high-quality bias-free interaction datasets. A training set of 2000 positive and 2000 negative interactions with sequences was given to us. Our method was evaluated with three independent high-quality interaction test datasets and with other competitors solutions.  ( 2 min )
    Computed Tomography Reconstruction using Generative Energy-Based Priors. (arXiv:2203.12658v1 [eess.IV])
    In the past decades, Computed Tomography (CT) has established itself as one of the most important imaging techniques in medicine. Today, the applicability of CT is only limited by the deposited radiation dose, reduction of which manifests in noisy or incomplete measurements. Thus, the need for robust reconstruction algorithms arises. In this work, we learn a parametric regularizer with a global receptive field by maximizing it's likelihood on reference CT data. Due to this unsupervised learning strategy, our trained regularizer truly represents higher-level domain statistics, which we empirically demonstrate by synthesizing CT images. Moreover, this regularizer can easily be applied to different CT reconstruction problems by embedding it in a variational framework, which increases flexibility and interpretability compared to feed-forward learning-based approaches. In addition, the accompanying probabilistic perspective enables experts to explore the full posterior distribution and may quantify uncertainty of the reconstruction approach. We apply the regularizer to limited-angle and few-view CT reconstruction problems, where it outperforms traditional reconstruction algorithms by a large margin.
    Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs. (arXiv:2203.12679v1 [cs.AI])
    Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i.e., different initial states) without requiring retraining.
    Are Evolutionary Algorithms Safe Optimizers?. (arXiv:2203.12622v1 [cs.NE])
    We consider a type of constrained optimization problem, where the violation of a constraint leads to an irrevocable loss, such as breakage of a valuable experimental resource/platform or loss of human life. Such problems are referred to as safe optimization problems (SafeOPs). While SafeOPs have received attention in the machine learning community in recent years, there was little interest in the evolutionary computation (EC) community despite some early attempts between 2009 and 2011. Moreover, there is a lack of acceptable guidelines on how to benchmark different algorithms for SafeOPs, an area where the EC community has significant experience in. Driven by the need for more efficient algorithms and benchmark guidelines for SafeOPs, the objective of this paper is to reignite the interest of this problem class in the EC community. To achieve this we (i) provide a formal definition of SafeOPs and contrast it to other types of optimization problems that the EC community is familiar with, (ii) investigate the impact of key SafeOP parameters on the performance of selected safe optimization algorithms, (iii) benchmark EC against state-of-the-art safe optimization algorithms from the machine learning community, and (iv) provide an open-source Python framework to replicate and extend our work.
    Learning Efficient Exploration through Human Seeded Rapidly-exploring Random Trees. (arXiv:2203.12774v1 [cs.LG])
    Modern day computer games have extremely large state and action spaces. To detect bugs in these games' models, human testers play the games repeatedly to explore the game and find errors in the games. Such game play is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of interest for the robotics community to ensure robot behaviors and interactions are consistent in simulators. Previous methods have used reinforcement learning and search based methods including Rapidly-exploring Random Trees (RRT) to explore a game's state-action space to find bugs. However, such search and exploration based methods are not efficient at exploring the state-action space without a pre-defined heuristic. In this work we attempt to combine a human-tester's expertise in solving games, and the exhaustiveness of RRT to search a game's state space efficiently with high coverage. This paper introduces human-seeded RRT (HS-RRT) and behavior-cloning-assisted RRT (CA-RRT) in testing the number of game states searched and the time taken to explore those game states. We compare our methods to an existing weighted RRT baseline for game exploration testing studied. We find HS-RRT and CA-RRT both explore more game states in fewer tree expansions/iterations when compared to the existing baseline. In each test, CA-RRT reached more states on average in the same number of iterations as RRT. In our tested environments, CA-RRT was able to reach the same number of states as RRT by more than 5000 fewer iterations on average, almost a 50% reduction.
    Q-FW: A Hybrid Classical-Quantum Frank-Wolfe for Quadratic Binary Optimization. (arXiv:2203.12633v1 [cs.CV])
    We present a hybrid classical-quantum framework based on the Frank-Wolfe algorithm, Q-FW, for solving quadratic, linearly-constrained, binary optimization problems on quantum annealers (QA). The computational premise of quantum computers has cultivated the re-design of various existing vision problems into quantum-friendly forms. Experimental QA realizations can solve a particular non-convex problem known as the quadratic unconstrained binary optimization (QUBO). Yet a naive-QUBO cannot take into account the restrictions on the parameters. To introduce additional structure in the parameter space, researchers have crafted ad-hoc solutions incorporating (linear) constraints in the form of regularizers. However, this comes at the expense of a hyper-parameter, balancing the impact of regularization. To date, a true constrained solver of quadratic binary optimization (QBO) problems has lacked. Q-FW first reformulates constrained-QBO as a copositive program (CP), then employs Frank-Wolfe iterations to solve CP while satisfying linear (in)equality constraints. This procedure unrolls the original constrained-QBO into a set of unconstrained QUBOs all of which are solved, in a sequel, on a QA. We use D-Wave Advantage QA to conduct synthetic and real experiments on two important computer vision problems, graph matching and permutation synchronization, which demonstrate that our approach is effective in alleviating the need for an explicit regularization coefficient.
    Bellman Residual Orthogonalization for Offline Reinforcement Learning. (arXiv:2203.12786v1 [cs.LG])
    We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.
    Possibility Before Utility: Learning And Using Hierarchical Affordances. (arXiv:2203.12686v1 [cs.LG])
    Reinforcement learning algorithms struggle on tasks with complex hierarchical dependency structures. Humans and other intelligent agents do not waste time assessing the utility of every high-level action in existence, but instead only consider ones they deem possible in the first place. By focusing only on what is feasible, or "afforded", at the present moment, an agent can spend more time both evaluating the utility of and acting on what matters. To this end, we present Hierarchical Affordance Learning (HAL), a method that learns a model of hierarchical affordances in order to prune impossible subtasks for more effective learning. Existing works in hierarchical reinforcement learning provide agents with structural representations of subtasks but are not affordance-aware, and by grounding our definition of hierarchical affordances in the present state, our approach is more flexible than the multitude of approaches that ground their subtask dependencies in a symbolic history. While these logic-based methods often require complete knowledge of the subtask hierarchy, our approach is able to utilize incomplete and varying symbolic specifications. Furthermore, we demonstrate that relative to non-affordance-aware methods, HAL agents are better able to efficiently learn complex tasks, navigate environment stochasticity, and acquire diverse skills in the absence of extrinsic supervision -- all of which are hallmarks of human learning.
    Asynchronous Collaborative Learning Across Data Silos. (arXiv:2203.12637v1 [cs.LG])
    Machine learning algorithms can perform well when trained on large datasets. While large organisations often have considerable data assets, it can be difficult for these assets to be unified in a manner that makes training possible. Data is very often 'siloed' in different parts of the organisation, with little to no access between silos. This fragmentation of data assets is especially prevalent in heavily regulated industries like financial services or healthcare. In this paper we propose a framework to enable asynchronous collaborative training of machine learning models across data silos. This allows data science teams to collaboratively train a machine learning model, without sharing data with one another. Our proposed approach enhances conventional federated learning techniques to make them suitable for this asynchronous training in this intra-organisation, cross-silo setting. We validate our proposed approach via extensive experiments.
    Shared Data and Algorithms for Deep Learning in Fundamental Physics. (arXiv:2107.00656v2 [cs.LG] UPDATED)
    We introduce a Python package that provides simply and unified access to a collection of datasets from fundamental physics research - including particle physics, astroparticle physics, and hadron- and nuclear physics - for supervised machine learning studies. The datasets contain hadronic top quarks, cosmic-ray induced air showers, phase transitions in hadronic matter, and generator-level histories. While public datasets from multiple fundamental physics disciplines already exist, the common interface and provided reference models simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics. We discuss the design and structure and line out how additional datasets can be submitted for inclusion. As showcase application, we present a simple yet flexible graph-based neural network architecture that can easily be applied to a wide range of supervised learning tasks. We show that our approach reaches performance close to dedicated methods on all datasets. To simplify adaptation for various problems, we provide easy-to-follow instructions on how graph-based representations of data structures, relevant for fundamental physics, can be constructed and provide code implementations for several of them. Implementations are also provided for our proposed method and all reference algorithms.
    Dynamically-Scaled Deep Canonical Correlation Analysis. (arXiv:2203.12377v2 [cs.LG] UPDATED)
    Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. Several variants of CCA have been introduced in the literature, in particular, variants based on deep neural networks for learning highly correlated nonlinear transformations of two views. As these models are parameterized conventionally, their learnable parameters remain independent of the inputs after the training process, which may limit their capacity for learning highly correlated representations. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model. In our deep-CCA models, the parameters of the last layer are scaled by a second neural network that is conditioned on the model's input, resulting in a parameterization that is dependent on the input samples. We evaluate our model on multiple datasets and demonstrate that the learned representations are more correlated in comparison to the conventionally-parameterized CCA-based models and also obtain preferable retrieval results. Our code is available at https://github.com/tomerfr/DynamicallyScaledDeepCCA.
    On Understanding the Influence of Controllable Factors with a Feature Attribution Algorithm: a Medical Case Study. (arXiv:2203.12701v1 [cs.AI])
    Feature attribution XAI algorithms enable their users to gain insight into the underlying patterns of large datasets through their feature importance calculation. Existing feature attribution algorithms treat all features in a dataset homogeneously, which may lead to misinterpretation of consequences of changing feature values. In this work, we consider partitioning features into controllable and uncontrollable parts and propose the Controllable fActor Feature Attribution (CAFA) approach to compute the relative importance of controllable features. We carried out experiments applying CAFA to two existing datasets and our own COVID-19 non-pharmaceutical control measures dataset. Experimental results show that with CAFA, we are able to exclude influences from uncontrollable features in our explanation while keeping the full dataset for prediction.
    An Exploration of Learnt Representations of W Jets. (arXiv:2109.10919v2 [hep-ph] UPDATED)
    I present a Variational Autoencoder (VAE) trained on collider physics data (specifically boosted $W$ jets), with reconstruction error given by an approximation to the Earth Movers Distance (EMD) between input and output jets. This VAE learns a concrete representation of the data manifold, with semantically meaningful and interpretable latent space directions which are hierarchically organized in terms of their relation to physical EMD scales in the underlying physical generative process. A hyperparameter $\beta$ controls the resolution at which the VAE is sensitive to structures in the data manifold. The variation of the latent space structure with $\beta$, and the scaling of some VAE properties, provide insight into scale dependent structure of the dataset and its information complexity. I introduce two measures of the dimensionality of the learnt representation that are calculated from this scaling.
    On the Applicability of ML Fairness Notions. (arXiv:2006.16745v3 [cs.LG] UPDATED)
    Fairness emerged as an important requirement to guarantee that Machine Learning (ML) predictive systems do not discriminate against specific individuals or entire sub-populations, in particular, minorities. Given the inherent subjectivity of viewing the concept of fairness, several notions of fairness have been introduced in the literature. This paper is a survey that illustrates the subtleties between fairness notions through a large number of examples and scenarios. In addition, unlike other surveys in the literature, it addresses the question of: which notion of fairness is most suited to a given real-world scenario and why? Our attempt to answer this question consists in (1) identifying the set of fairness-related characteristics of the real-world scenario at hand, (2) analyzing the behavior of each fairness notion, and then (3) fitting these two elements to recommend the most suitable fairness notion in every specific setup. The results are summarized in a decision diagram that can be used by practitioners and policymakers to navigate the relatively large catalog of ML.
    Interpretable Prediction of Pulmonary Hypertension in Newborns using Echocardiograms. (arXiv:2203.13038v1 [eess.IV])
    Pulmonary hypertension (PH) in newborns and infants is a complex condition associated with several pulmonary, cardiac, and systemic diseases contributing to morbidity and mortality. Therefore, accurate and early detection of PH is crucial for successful management. Using echocardiography, the primary diagnostic tool in pediatrics, human assessment is both time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we present an interpretable multi-view video-based deep learning approach to predict PH for a cohort of 194 newborns using echocardiograms. We use spatio-temporal convolutional architectures for the prediction of PH from each view, and aggregate the predictions of the different views using majority voting. To the best of our knowledge, this is the first work for an automated assessment of PH in newborns using echocardiograms. Our results show a mean F1-score of 0.84 for severity prediction and 0.92 for binary detection using 10-fold cross-validation. We complement our predictions with saliency maps and show that the learned model focuses on clinically relevant cardiac structures, motivating its usage in clinical practice.
    Constrained Parameter Inference as a Principle for Learning. (arXiv:2203.13203v1 [cs.NE])
    Learning in biological and artificial neural networks is often framed as a problem in which targeted error signals guide parameter updating for more optimal network behaviour. Backpropagation of error (BP) is an example of such an approach and has proven to be a highly successful application of stochastic gradient descent to deep neural networks. However, BP relies on the global transmission of gradient information and has therefore been criticised for its biological implausibility. We propose constrained parameter inference (COPI) as a new principle for learning. COPI allows for the estimation of network parameters under the constraints of decorrelated neural inputs and top-down perturbations of neural states. We show that COPI not only is more biologically plausible but also provides distinct advantages for fast learning, compared with standard backpropagation of error.
    DyRep: Bootstrapping Training with Dynamic Re-parameterization. (arXiv:2203.12868v1 [cs.CV])
    Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model's performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to bootstrap the training with minimal cost by devising a dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures. Concretely, our proposal adaptively finds the operations which contribute most to the loss in the network, and applies Rep to enhance their representational capacity. Besides, to suppress the noisy and redundant operations introduced by Rep, we devise a de-parameterization technique for a more compact re-parameterization. With this regard, DyRep is more efficient than Rep since it smoothly evolves the given network instead of constructing an over-parameterized network. Experimental results demonstrate our effectiveness, e.g., DyRep improves the accuracy of ResNet-18 by $2.04\%$ on ImageNet and reduces $22\%$ runtime over the baseline. Code is available at: https://github.com/hunto/DyRep.
    Text to Image Generation with Semantic-Spatial Aware GAN. (arXiv:2104.00567v6 [cs.CV] UPDATED)
    Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions. Existing methods are usually built upon conditional generative adversarial networks (GANs) and initialize an image from noise with sentence embedding, and then refine the features with fine-grained word embedding iteratively. A close inspection of their generated images reveals a major limitation: even though the generated image holistically matches the description, individual image regions or parts of somethings are often not recognizable or consistent with words in the sentence, e.g. "a white crown". To address this problem, we propose a novel framework Semantic-Spatial Aware GAN for synthesizing images from input text. Concretely, we introduce a simple and effective Semantic-Spatial Aware block, which (1) learns semantic-adaptive transformation conditioned on text to effectively fuse text features and image features, and (2) learns a semantic mask in a weakly-supervised way that depends on the current text-image fusion process in order to guide the transformation spatially. Experiments on the challenging COCO and CUB bird datasets demonstrate the advantage of our method over the recent state-of-the-art approaches, regarding both visual fidelity and alignment with input text description.  ( 2 min )
    Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges. (arXiv:2203.12852v1 [hep-ex])
    Many physical systems can be best understood as sets of discrete data with associated relationships. Where previously these sets of data have been formulated as series or image data to match the available machine learning architectures, with the advent of graph neural networks (GNNs), these systems can be learned natively as graphs. This allows a wide variety of high- and low-level physical features to be attached to measurements and, by the same token, a wide variety of HEP tasks to be accomplished by the same GNN architectures. GNNs have found powerful use-cases in reconstruction, tagging, generation and end-to-end analysis. With the wide-spread adoption of GNNs in industry, the HEP community is well-placed to benefit from rapid improvements in GNN latency and memory usage. However, industry use-cases are not perfectly aligned with HEP and much work needs to be done to best match unique GNN capabilities to unique HEP obstacles. We present here a range of these capabilities, predictions of which are currently being well-adopted in HEP communities, and which are still immature. We hope to capture the landscape of graph techniques in machine learning as well as point out the most significant gaps that are inhibiting potentially large leaps in research.  ( 2 min )
    MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data. (arXiv:2203.12369v2 [cs.SD] UPDATED)
    Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results. Incorporating psychoacoustically motivated speech perception metrics as part of model training via a predictor network has recently gained interest. However, the performance of such predictors is limited by the distribution of metric scores that appear in the training data. In this work, we propose MetricGAN+/- (an extension of MetricGAN+, one such metric-motivated system) which introduces an additional network - a "de-generator" which attempts to improve the robustness of the prediction network (and by extension of the generator) by ensuring observation of a wider range of metric scores in training. Experimental results on the VoiceBank-DEMAND dataset show relative improvement in PESQ score of 3.8% (3.05 vs 3.22 PESQ score), as well as better generalisation to unseen noise and speech.
    FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction. (arXiv:2203.08411v2 [cs.CL] UPDATED)
    Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverages the spatial relationship between tokens in a form for more precise attention score calculation. Second, we construct Super-Tokens for each word by embedding representations from their neighboring tokens through graph convolutions. FormNet therefore explicitly recovers local syntactic information that may have been lost during serialization. In experiments, FormNet outperforms existing methods with a more compact model size and less pre-training data, establishing new state-of-the-art performance on CORD, FUNSD and Payment benchmarks.  ( 2 min )
    Direct evaluation of progression or regression of disease burden in brain metastatic disease with Deep Neuroevolution. (arXiv:2203.12853v1 [cs.NE])
    Purpose: A core component of advancing cancer treatment research is assessing response to therapy. Doing so by hand, for example as per RECIST or RANO criteria, is tedious, time-consuming, and can miss important tumor response information; most notably, they exclude non-target lesions. We wish to assess change in a holistic fashion that includes all lesions, obtaining simple, informative, and automated assessments of tumor progression or regression. Due to often low patient enrolments in clinical trials, we wish to make response assessments with small training sets. Deep neuroevolution (DNE) can produce radiology artificial intelligence (AI) that performs well on small training sets. Here we use DNE for function approximation that predicts progression versus regression of metastatic brain disease. Methods: We analyzed 50 pairs of MRI contrast-enhanced images as our training set. Half of these pairs, separated in time, qualified as disease progression, while the other 25 images constituted regression. We trained the parameters of a relatively small CNN via mutations that consisted of random CNN weight adjustments and mutation fitness. We then incorporated the best mutations into the next generations CNN, repeating this process for approximately 50,000 generations. We applied the CNNs to our training set, as well as a separate testing set with the same class balance of 25 progression and 25 regression images. Results: DNE achieved monotonic convergence to 100% training set accuracy. DNE also converged monotonically to 100% testing set accuracy. Conclusion: DNE can accurately classify brain-metastatic disease progression versus regression. Future work will extend the input from 2D image slices to full 3D volumes, and include the category of no change. We believe that an approach such as our could ultimately provide a useful adjunct to RANO/RECIST assessment.
    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders. (arXiv:2203.12742v1 [cs.LG])
    Bayesian optimization is a gold standard for query-efficient continuous optimization. However, its adoption for drug and antibody sequence design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, enabling gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit trade-off over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on a small-molecule task based on the ZINC dataset and introduce a new large-molecule task targeting fluorescent proteins. In our experiments, LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that Bayesian optimization is practical and effective for biological sequence design.
    GraphCoCo: Graph Complementary Contrastive Learning. (arXiv:2203.12821v1 [cs.LG])
    Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations. GCL can generate graph-level embeddings by maximizing the Mutual Information (MI) between different augmented views of the same graph (positive pairs). However, we identify an obstacle that the optimization of InfoNCE loss only concentrates on a few embeddings dimensions, limiting the distinguishability of embeddings in downstream graph classification tasks. This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue. Specifically, we set the embedding of the first augmented view as the anchor embedding to localize "highlighted" dimensions (i.e., the dimensions contribute most in similarity measurement). Then remove these dimensions in the embeddings of the second augmented view to discover neglected complementary representations. Therefore, the combination of anchor and complementary embeddings significantly improves the performance in downstream tasks. Comprehensive experiments on various benchmark datasets are conducted to demonstrate the effectiveness of GraphCoCo, and the results show that our model outperforms the state-of-the-art methods. Source code will be made publicly available.
    Knowledge Removal in Sampling-based Bayesian Inference. (arXiv:2203.12964v1 [cs.LG])
    The right to be forgotten has been legislated in many countries, but its enforcement in the AI industry would cause unbearable costs. When single data deletion requests come, companies may need to delete the whole models learned with massive resources. Existing works propose methods to remove knowledge learned from data for explicitly parameterized models, which however are not appliable to the sampling-based Bayesian inference, i.e., Markov chain Monte Carlo (MCMC), as MCMC can only infer implicit distributions. In this paper, we propose the first machine unlearning algorithm for MCMC. We first convert the MCMC unlearning problem into an explicit optimization problem. Based on this problem conversion, an {\it MCMC influence function} is designed to provably characterize the learned knowledge from data, which then delivers the MCMC unlearning algorithm. Theoretical analysis shows that MCMC unlearning would not compromise the generalizability of the MCMC models. Experiments on Gaussian mixture models and Bayesian neural networks confirm the effectiveness of the proposed algorithm. The code is available at \url{https://github.com/fshp971/mcmc-unlearning}.
    Learning Dense Correspondence from Synthetic Environments. (arXiv:2203.12919v1 [cs.CV])
    Estimation of human shape and pose from a single image is a challenging task. It is an even more difficult problem to map the identified human shape onto a 3D human model. Existing methods map manually labelled human pixels in real 2D images onto the 3D surface, which is prone to human error, and the sparsity of available annotated data often leads to sub-optimal results. We propose to solve the problem of data scarcity by training 2D-3D human mapping algorithms using automatically generated synthetic data for which exact and dense 2D-3D correspondence is known. Such a learning strategy using synthetic environments has a high generalisation potential towards real-world data. Using different camera parameter variations, background and lighting settings, we created precise ground truth data that constitutes a wider distribution. We evaluate the performance of models trained on synthetic using the COCO dataset and validation framework. Results show that training 2D-3D mapping network models on synthetic data is a viable alternative to using real data.  ( 2 min )
    Deep Bidirectional Transformers for SoC Flow Specification Mining. (arXiv:2203.13182v1 [cs.LG])
    High-quality system-level message flow specifications can lead to comprehensive validation of system-on-chip (SoC) designs. We propose a disruptive method that utilizes an attention mechanism to produce accurate flow specifications from SoC IP communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrency and parallelism of multicore designs that existing flow specification mining tools often find extremely challenging. Experiments on highly interleaved traces show promising flow reconstruction compared to several tools dedicated to the flow specification mining problem.  ( 2 min )
    A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima. (arXiv:2203.10973v2 [cs.LG] UPDATED)
    Non-convex loss functions arise frequently in modern machine learning, and for the theoretical analysis of stochastic optimization methods, the presence of non-isolated minima presents a unique challenge that has remained under-explored. In this paper, we study the local convergence of the stochastic gradient descent method to non-isolated global minima. Under mild assumptions, we estimate the probability for the iterations to stay near the minima by adopting the notion of stochastic stability. After establishing such stability, we present the lower bound complexity in terms of various error criteria for a given error tolerance $\epsilon$ and a failure probability $\gamma$.  ( 2 min )
    Enhancing Classifier Conservativeness and Robustness by Polynomiality. (arXiv:2203.12693v1 [cs.LG])
    We illustrate the detrimental effect, such as overconfident decisions, that exponential behavior can have in methods like classical LDA and logistic regression. We then show how polynomiality can remedy the situation. This, among others, leads purposefully to random-level performance in the tails, away from the bulk of the training data. A directly related, simple, yet important technical novelty we subsequently present is softRmax: a reasoned alternative to the standard softmax function employed in contemporary (deep) neural networks. It is derived through linking the standard softmax to Gaussian class-conditional models, as employed in LDA, and replacing those by a polynomial alternative. We show that two aspects of softRmax, conservativeness and inherent gradient regularization, lead to robustness against adversarial attacks without gradient obfuscation.
    MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion. (arXiv:2203.12621v1 [eess.IV])
    Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty.
    Vision-Based Manipulators Need to Also See from Their Hands. (arXiv:2203.12677v1 [cs.RO])
    We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. These benefits hold across a variety of learning algorithms, experimental settings, and distribution shifts, and for both simulated and real robot apparatuses. However, this is only the case when hand-centric observability is sufficient; otherwise, including a third-person perspective is necessary for learning, but also harms out-of-distribution generalization. To mitigate this, we propose to regularize the third-person information stream via a variational information bottleneck. On six representative manipulation tasks with varying hand-centric observability adapted from the Meta-World benchmark, this results in a state-of-the-art reinforcement learning agent operating from both perspectives improving its out-of-distribution generalization on every task. While some practitioners have long put cameras in the hands of robots, our work systematically analyzes the benefits of doing so and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation.  ( 2 min )
    Evaluation of Non-Invasive Thermal Imaging for detection of Viability of Onchocerciasis worms. (arXiv:2203.12620v1 [eess.IV])
    Onchocerciasis is causing blindness in over half a million people in the world today. Drug development for the disease is crippled as there is no way of measuring effectiveness of the drug without an invasive procedure. Drug efficacy measurement through assessment of viability of onchocerca worms requires the patients to undergo nodulectomy which is invasive, expensive, time-consuming, skill-dependent, infrastructure dependent and lengthy process. In this paper, we discuss the first-ever study that proposes use of machine learning over thermal imaging to non-invasively and accurately predict the viability of worms. The key contributions of the paper are (i) a unique thermal imaging protocol along with pre-processing steps such as alignment, registration and segmentation to extract interpretable features (ii) extraction of relevant semantic features (iii) development of accurate classifiers for detecting the existence of viable worms in a nodule. When tested on a prospective test data of 30 participants with 48 palpable nodules, we achieved an Area Under the Curve (AUC) of 0.85.
    NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks. (arXiv:2203.12915v1 [cs.LG])
    Deep learning has recently been widely applied to many applications across different domains, e.g., image classification and audio recognition. However, the quality of Deep Neural Networks (DNNs) still raises concerns in the practical operational environment, which calls for systematic testing, especially in safety-critical scenarios. Inspired by software testing, a number of structural coverage criteria are designed and proposed to measure the test adequacy of DNNs. However, due to the blackbox nature of DNN, the existing structural coverage criteria are difficult to interpret, making it hard to understand the underlying principles of these criteria. The relationship between the structural coverage and the decision logic of DNNs is unknown. Moreover, recent studies have further revealed the non-existence of correlation between the structural coverage and DNN defect detection, which further posts concerns on what a suitable DNN testing criterion should be. In this paper, we propose the interpretable coverage criteria through constructing the decision structure of a DNN. Mirroring the control flow graph of the traditional program, we first extract a decision graph from a DNN based on its interpretation, where a path of the decision graph represents a decision logic of the DNN. Based on the control flow and data flow of the decision graph, we propose two variants of path coverage to measure the adequacy of the test cases in exercising the decision logic. The higher the path coverage, the more diverse decision logic the DNN is expected to be explored. Our large-scale evaluation results demonstrate that: the path in the decision graph is effective in characterizing the decision of the DNN, and the proposed coverage criteria are also sensitive with errors including natural errors and adversarial examples, and strongly correlated with the output impartiality.  ( 3 min )
    GEMA: An open-source Python library for self-organizing-maps. (arXiv:2203.13190v1 [cs.NE])
    Organizations have realized the importance of data analysis and its benefits. This in combination with Machine Learning algorithms has allowed to solve problems more easily, making these processes less time-consuming. Neural networks are the Machine Learning technique that is recently obtaining very good best results. This paper describes an open-source Python library called GEMA developed to work with a type of neural network model called Self-Organizing-Maps. GEMA is freely available under GNU General Public License at GitHub (https://github.com/ufvceiec/GEMA). The library has been evaluated in different a particular use case obtaining accurate results.  ( 2 min )
    Kernel-Based Reinforcement Learning: A Finite-Time Analysis. (arXiv:2004.05599v3 [cs.LG] UPDATED)
    We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.  ( 2 min )
    Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits. (arXiv:2106.02575v5 [cs.LG] UPDATED)
    In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting where each arm's reward distribution only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we establish the lower bound to show that the instance-dependent regret of our improved algorithm is optimal. In the second part, we study the problem in the $\epsilon$-LDP model. We propose an algorithm that can be seen as locally private and robust version of SE algorithm, which provably achieves (near) optimal rates for both instance-dependent and instance-independent regret. Our results reveal differences between the problem of private MAB with bounded/sub-Gaussian rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might be used to other related problems. Finally, experiments also support our theoretical findings and show the effectiveness of our algorithms.  ( 2 min )
    Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction. (arXiv:2203.13088v1 [cs.IR])
    Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reductions dramatically lower ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors per document by learning unique whole-word representations for the terms in each document and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer can reduce the storage footprint by up to 2.5x, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality out-of-domain collections, yielding statistically significant gains over traditional retrieval baselines.  ( 2 min )
    Accurate Shapley Values for explaining tree-based models. (arXiv:2106.03820v2 [stat.ML] UPDATED)
    Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and estimated, implying that their analysis may lead to spurious inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the ability of SV to provide reliable local explanations. We also provide a Python package that computes our estimators at https://github.com/salimamoukou/acv00.  ( 2 min )
    Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits. (arXiv:2109.09855v2 [cs.LG] UPDATED)
    We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. The goal is to sequentially choose actions for arms so as to maximize the expected value of the cumulative rewards collected. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy. Our policy is well-defined even if the underlying MDPs are not indexable. We prove that it is asymptotically optimal when the activation budget and number of arms are scaled up, while keeping their ratio as a constant. For the case when the system parameters are unknown, we develop a learning algorithm. Our learning algorithm uses the principle of optimism in the face of uncertainty and further uses a generative model in order to fully exploit the structure of Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algorithm. As compared with the existing algorithms, R(MA)^2B-UCB performs close to an offline optimum policy, and also achieves a sub-linear regret with a low computational complexity. Experimental results show that R(MA)^2B-UCB outperforms the existing algorithms in both regret and run time.  ( 2 min )
    Waveform Learning for Next-Generation Wireless Communication Systems. (arXiv:2109.00998v3 [cs.IT] UPDATED)
    We propose a learning-based method for the joint design of a transmit and receive filter, the constellation geometry and associated bit labeling, as well as a neural network (NN)-based detector. The method maximizes an achievable information rate, while simultaneously satisfying constraints on the adjacent channel leakage ratio (ACLR) and peak-to-average power ratio (PAPR). This allows control of the tradeoff between spectral containment, peak power, and communication rate. Evaluation on an additive white Gaussian noise (AWGN) channel shows significant reduction of ACLR and PAPR compared to a conventional baseline relying on quadrature amplitude modulation (QAM) and root-raised-cosine (RRC), without significant loss of information rate. When considering a 3rd Generation Partnership Project (3GPP) multipath channel, the learned waveform and neural receiver enable competitive or higher rates than an orthogonal frequency division multiplexing (OFDM) baseline, while reducing the ACLR by 10 dB and the PAPR by 2 dB. The proposed method incurs no additional complexity on the transmitter side and might be an attractive tool for waveform design of beyond-5G systems.  ( 2 min )
    Pseudo Label Is Better Than Human Label. (arXiv:2203.12668v1 [cs.LG])
    State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data. Human transcription is expensive and time consuming. Factors such as the quality and consistency of the transcription can greatly affect the performance of the ASR models trained with these data. In this paper, we show that we can train a strong teacher model to produce high quality pseudo labels by utilizing recent self-supervised and semi-supervised learning techniques. Specifically, we use JUST (Joint Unsupervised/Supervised Training) and iterative noisy student teacher training to train a 600 million parameter bi-directional teacher model. This model achieved 4.0% word error rate (WER) on a voice search task, 11.1% relatively better than a baseline. We further show that by using this strong teacher model to generate high-quality pseudo labels for training, we can achieve 13.6% relative WER reduction (5.9% to 5.1%) for a streaming model compared to using human labels.  ( 2 min )
    Risk Consistent Multi-Class Learning from Label Proportions. (arXiv:2203.12836v1 [cs.LG])
    This study addresses a multiclass learning from label proportions (MCLLP) setting in which training instances are provided in bags and only the proportion of each class within the bags is provided. Most existing MCLLP methods impose bag-wise constraints on the prediction of instances or assign them pseudo-labels; however, none of these methods have a theoretical consistency. To solve this problem, a risk-consistent method is proposed for instance classification using the empirical risk minimization framework, and its estimation error bound is derived. An approximation method is proposed for the proposed risk estimator, to apply it to large bags, by diverting the constraints on bags in existing research. The proposed method can be applied to any deep model or loss and is compatible with stochastic optimization. Experiments are conducted on benchmarks to verify the effectiveness of the proposed method.  ( 2 min )
    When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning Framework in Classification of Medical Images on Limited Data: A COVID-19 Case Study. (arXiv:2203.12803v1 [eess.IV])
    COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources. The efficiency of COVID-19 diagnosis has become highly significant. As deep learning and convolutional neural network (CNN) has been widely utilized and been verified in analyzing medical images, it has become a powerful tool for computer-assisted diagnosis. However, there are two most significant challenges in medical image classification with the help of deep learning and neural networks, one of them is the difficulty of acquiring enough samples, which may lead to model overfitting. Privacy concerns mainly bring the other challenge since medical-related records are often deemed patients' private information and protected by laws such as GDPR and HIPPA. Federated learning can ensure the model training is decentralized on different devices and no data is shared among them, which guarantees privacy. However, with data located on different devices, the accessible data of each device could be limited. Since transfer learning has been verified in dealing with limited data with good performance, therefore, in this paper, We made a trial to implement federated learning and transfer learning techniques using CNNs to classify COVID-19 using lung CT scans. We also explored the impact of dataset distribution at the client-side in federated learning and the number of training epochs a model is trained. Finally, we obtained very high performance with federated learning, demonstrating our success in leveraging accuracy and privacy.  ( 3 min )
    Linearizing Transformer with Key-Value Memory Bank. (arXiv:2203.12644v1 [cs.CL])
    Transformer has brought great success to a wide range of natural language processing tasks. Nevertheless, the computational overhead of the vanilla transformer scales quadratically with sequence length. Many efforts have been made to develop more efficient transformer variants. A line of work (e.g., Linformer) projects the input sequence into a low-rank space, achieving linear time complexity. However, Linformer does not suit well for text generation tasks as the sequence length must be pre-specified. We propose MemSizer, an approach also projects the source sequence into lower dimension representation but can take input with dynamic length, with a different perspective of the attention mechanism. MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation, which yields constant memory complexity and reduced computation at inference. We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer and other linear variants in language modeling and machine translation tasks, revealing a viable direction towards further inference efficiency improvement.  ( 2 min )
    DPar2: Fast and Scalable PARAFAC2 Decomposition for Irregular Dense Tensors. (arXiv:2203.12798v1 [cs.LG])
    Given an irregular dense tensor, how can we efficiently analyze it? An irregular tensor is a collection of matrices whose columns have the same size and rows have different sizes from each other. PARAFAC2 decomposition is a fundamental tool to deal with an irregular tensor in applications including phenotype discovery and trend analysis. Although several PARAFAC2 decomposition methods exist, their efficiency is limited for irregular dense tensors due to the expensive computations involved with the tensor. In this paper, we propose DPar2, a fast and scalable PARAFAC2 decomposition method for irregular dense tensors. DPar2 achieves high efficiency by effectively compressing each slice matrix of a given irregular tensor, careful reordering of computations with the compression results, and exploiting the irregularity of the tensor. Extensive experiments show that DPar2 is up to 6.0x faster than competitors on real-world irregular tensors while achieving comparable accuracy. In addition, DPar2 is scalable with respect to the tensor size and target rank.  ( 2 min )
    Competency Assessment for Autonomous Agents using Deep Generative Models. (arXiv:2203.12670v1 [cs.LG])
    For autonomous agents to act as trustworthy partners to human users, they must be able to reliably communicate their competency for the tasks they are asked to perform. Towards this objective, we develop probabilistic world models based on deep generative modelling that allow for the simulation of agent trajectories and accurate calculation of tasking outcome probabilities. By combining the strengths of conditional variational autoencoders with recurrent neural networks, the deep generative world model can probabilistically forecast trajectories over long horizons to task completion. We show how these forecasted trajectories can be used to calculate outcome probability distributions, which enable the precise assessment of agent competency for specific tasks and initial settings.  ( 2 min )
    Applications of physics informed neural operators. (arXiv:2203.12634v1 [physics.comp-ph])
    We present an end-to-end framework to learn partial differential equations that brings together initial data production, selection of boundary conditions, and the use of physics-informed neural operators to solve partial differential equations that are ubiquitous in the study and modeling of physics phenomena. We first demonstrate that our methods reproduce the accuracy and performance of other neural operators published elsewhere in the literature to learn the 1D wave equation and the 1D Burgers equation. Thereafter, we apply our physics-informed neural operators to learn new types of equations, including the 2D Burgers equation in the scalar, inviscid and vector types. Finally, we show that our approach is also applicable to learn the physics of the 2D linear and nonlinear shallow water equations, which involve three coupled partial differential equations. We release our artificial intelligence surrogates and scientific software to produce initial data and boundary conditions to study a broad range of physically motivated scenarios. We provide the source code, an interactive website to visualize the predictions of our physics informed neural operators, and a tutorial for their use at the Data and Learning Hub for Science.  ( 2 min )
    A Deep Reinforcement Learning-Based Caching Strategy for IoT Networks with Transient Data. (arXiv:2203.12674v1 [cs.NI])
    The Internet of Things (IoT) has been continuously rising in the past few years, and its potentials are now more apparent. However, transient data generation and limited energy resources are the major bottlenecks of these networks. Besides, minimum delay and other conventional quality of service measurements are still valid requirements to meet. An efficient caching policy can help meet the standard quality of service requirements while bypassing IoT networks' specific limitations. Adopting deep reinforcement learning (DRL) algorithms enables us to develop an effective caching scheme without the need for any prior knowledge or contextual information. In this work, we propose a DRL-based caching scheme that improves the cache hit rate and reduces energy consumption of the IoT networks, in the meanwhile, taking data freshness and limited lifetime of IoT data into account. To better capture the regional-different popularity distribution, we propose a hierarchical architecture to deploy edge caching nodes in IoT networks. The results of comprehensive experiments show that our proposed method outperforms the well-known conventional caching policies and an existing DRL-based solution in terms of cache hit rate and energy consumption of the IoT networks by considerable margins.  ( 2 min )
    Towards All-Purpose Domain Adaptation Under Confounding. (arXiv:2203.12720v1 [stat.ML])
    Current domain adaptation methods address the problems of covariate shift or label shift, but are not applicable to the setting where they occur simultaneously and interact with each other. In this paper, we propose an assumption, confounded shift, to begin to address this problem. We also propose a framework for this task, based on minimizing the expected divergence between the source and target conditional distributions. Within this framework, we propose using the reverse KL divergence, demonstrating the use of both parametric linear Gaussian and nonparametric nonlinear Gaussian Process estimators of the conditional distribution. We also propose using the Maximum Mean Discrepancy (MMD) within our framework. To make confounded domain adaptation with the MMD effective, we propose an intelligent dynamic strategy for choosing the kernel bandwidth, which may be of independent interest even outside of the confounded shift context. Finally, we show that our approach is advantageous on a variety of synthetic and real datasets.  ( 2 min )
    Predicting Multi-Antenna Frequency-Selective Channels via Meta-Learned Linear Filters based on Long-Short Term Channel Decomposition. (arXiv:2203.12715v1 [eess.SP])
    An efficient data-driven prediction strategy for multi-antenna frequency-selective channels must operate based on a small number of pilot symbols. This paper proposes novel channel prediction algorithms that address this goal by integrating transfer and meta-learning with a reduced-rank parametrization of the channel. The proposed methods optimize linear predictors by utilizing data from previous frames, which are generally characterized by distinct propagation characteristics, in order to enable fast training on the time slots of the current frame. The proposed predictors rely on a novel long-short-term decomposition (LSTD) of the linear prediction model that leverages the disaggregation of the channel into long-term space-time signatures and fading amplitudes. We first develop predictors for single-antenna frequency-flat channels based on transfer/meta-learned quadratic regularization. Then, we introduce transfer and meta-learning algorithms for LSTD-based prediction models that build on equilibrium propagation (EP) and alternating least squares (ALS). Numerical results under the 3GPP 5G standard channel model demonstrate the impact of transfer and meta-learning on reducing the number of pilots for channel prediction, as well as the merits of the proposed LSTD parametrization.  ( 2 min )
    Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models. (arXiv:2203.12758v1 [cs.LG])
    Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area- and energy-efficient hardware accelerator design. Over a set of state-of-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least $4\times$ and as much as $15\times$ depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as a memory compression assist for any other accelerator, transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers.  ( 2 min )
    Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions. (arXiv:2203.12667v1 [cs.CV])
    A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.  ( 2 min )
    The Challenges of Continuous Self-Supervised Learning. (arXiv:2203.12710v1 [cs.CV])
    Self-supervised learning (SSL) aims to eliminate one of the major bottlenecks in representation learning - the need for human annotations. As a result, SSL holds the promise to learn representations from data in-the-wild, i.e., without the need for finite and static datasets. Instead, true SSL algorithms should be able to exploit the continuous stream of data being generated on the internet or by agents exploring their environments. But do traditional self-supervised learning approaches work in this setup? In this work, we investigate this question by conducting experiments on the continuous self-supervised learning problem. While learning in the wild, we expect to see a continuous (infinite) non-IID data stream that follows a non-stationary distribution of visual concepts. The goal is to learn a representation that can be robust, adaptive yet not forgetful of concepts seen in the past. We show that a direct application of current methods to such continuous setup is 1) inefficient both computationally and in the amount of data required, 2) leads to inferior representations due to temporal correlations (non-IID data) in some sources of streaming data and 3) exhibits signs of catastrophic forgetting when trained on sources with non-stationary data distributions. We propose the use of replay buffers as an approach to alleviate the issues of inefficiency and temporal correlations. We further propose a novel method to enhance the replay buffer by maintaining the least redundant samples. Minimum redundancy (MinRed) buffers allow us to learn effective representations even in the most challenging streaming scenarios composed of sequential visual data obtained from a single embodied agent, and alleviates the problem of catastrophic forgetting when learning from data with non-stationary semantic distributions.  ( 2 min )
    On the Search for Feedback in Reinforcement Learning. (arXiv:2002.09478v6 [cs.LG] UPDATED)
    The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical system is equivalent to the search for an optimal feedback law utilizing the simulations/ rollouts of the dynamical system. Most RL techniques search over a complex global nonlinear feedback parametrization making them suffer from high training times as well as variance. Instead, we advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop. We show that this alternate approach results in highly efficient training, the answers obtained are repeatable and hence reliable, and the resulting closed performance is superior to global state-of-the-art RL techniques. Finally, if we replan, whenever required, which is feasible due to the fast and reliable local solution, it allows us to recover global optimality of the resulting feedback law.
    Is Fairness Only Metric Deep? Evaluating and Addressing Subgroup Gaps in Deep Metric Learning. (arXiv:2203.12748v1 [cs.LG])
    Deep metric learning (DML) enables learning with less supervision through its emphasis on the similarity structure of representations. There has been much work on improving generalization of DML in settings like zero-shot retrieval, but little is known about its implications for fairness. In this paper, we are the first to evaluate state-of-the-art DML methods trained on imbalanced data, and to show the negative impact these representations have on minority subgroup performance when used for downstream tasks. In this work, we first define fairness in DML through an analysis of three properties of the representation space -- inter-class alignment, intra-class alignment, and uniformity -- and propose finDML, the fairness in non-balanced DML benchmark to characterize representation fairness. Utilizing finDML, we find bias in DML representations to propagate to common downstream classification tasks. Surprisingly, this bias is propagated even when training data in the downstream task is re-balanced. To address this problem, we present Partial Attribute De-correlation (PARADE) to de-correlate feature representations from sensitive attributes and reduce performance gaps between subgroups in both embedding space and downstream metrics.  ( 2 min )
    mcBERT: Momentum Contrastive Learning with BERT for Zero-Shot Slot Filling. (arXiv:2203.12940v1 [cs.CL])
    Zero-shot slot filling has received considerable attention to cope with the problem of limited available data for the target domain. One of the important factors in zero-shot learning is to make the model learn generalized and reliable representations. For this purpose, we present mcBERT, which stands for momentum contrastive learning with BERT, to develop a robust zero-shot slot filling model. mcBERT uses BERT to initialize the two encoders, the query encoder and key encoder, and is trained by applying momentum contrastive learning. Our experimental results on the SNIPS benchmark show that mcBERT substantially outperforms the previous models, recording a new state-of-the-art. Besides, we also show that each component composing mcBERT contributes to the performance improvement.
  • Open

    Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming. (arXiv:2203.01382v2 [cs.LG] CROSS LISTED)
    Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process of how these heuristics are created in practice has remained under-explored. In this work, we formalize the development process of labeling heuristics as an interactive procedure, built around the existing workflow where users draw ideas from a selected set of development data for designing the heuristic sources. With the formalism, we study two core problems of how to strategically select the development data to guide users in efficiently creating informative heuristics, and how to exploit the information within the development process to contextualize and better learn from the resultant heuristics. Building upon two novel methodologies that effectively tackle the respective problems considered, we present Nemo, an end-to-end interactive system that improves the overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS approach.
    Representation of binary classification trees with binary features by quantum circuits. (arXiv:2108.13207v2 [quant-ph] UPDATED)
    We propose a quantum representation of binary classification trees with binary features based on a probabilistic approach. By using the quantum computer as a processor for probability distributions, a probabilistic traversal of the decision tree can be realized via measurements of a quantum circuit. We describe how tree inductions and the prediction of class labels of query data can be integrated into this framework. An on-demand sampling method enables predictions with a constant number of classical memory slots, independent of the tree depth. We experimentally study our approach using both a quantum computing simulator and actual IBM quantum hardware. To our knowledge, this is the first realization of a decision tree classifier on a quantum device.  ( 2 min )
    Audio-Visual Speech Enhancement using Multimodal Deep Convolutional Neural Network. (arXiv:1709.00944v4 [cs.SD] UPDATED)
    Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus on addressing audio information only. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) in SE, we propose an audio-visual deep CNN (AVDCNN) SE model, which incorporates audio and visual streams into a unified network model. In the proposed AVDCNN SE model, audio and visual data are first processed using individual CNNs, and then, fused into a joint network to generate enhanced speech at the output layer. The AVDCNN model is trained in an end-to-end manner, and parameters are jointly learned through back-propagation. We evaluate enhanced speech using five objective criteria. Results show that the AVDCNN yields notably better performance, compared with an audio-only CNN-based SE model and two conventional SE approaches, confirming the effectiveness of integrating visual information into the SE process.  ( 2 min )
    Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation. (arXiv:2006.08781v3 [cs.LG] UPDATED)
    Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.  ( 2 min )
    The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models. (arXiv:2203.13084v1 [cs.LG])
    Novel prediction methods should always be compared to a baseline to know how well they perform. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is needed to evaluate the `goodness' of a performance score. Comparing with the latest state-of-the-art model is usually insightful. However, being state-of-the-art can change rapidly when newer models are developed. Contrary to an advanced model, a simple dummy classifier could be used. However, the latter could be beaten too easily, making the comparison less valuable. This paper presents a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. We theoretically derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is: (1) general, as it is applicable to all binary classification problems; (2) simple, as it is quickly determined without training or parameter-tuning; (3) informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, to enable comparisons across research papers by this robust and universal baseline. Secondly, to provide a sanity check during the development process of a prediction model. It is a major warning sign when a model is outperformed by the DD baseline.  ( 2 min )
    Kernel-Based Reinforcement Learning: A Finite-Time Analysis. (arXiv:2004.05599v3 [cs.LG] UPDATED)
    We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.  ( 2 min )
    WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses. (arXiv:2203.10750v2 [cs.SD] UPDATED)
    In this paper, we develop a new multi-singer Chinese neural singing voice synthesis (SVS) system named WeSinger. To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pitch-weighted decoder loss; 3) a 24 kHz pitch-aware LPCNet neural vocoder to produce high-quality singing waveforms; 4) A novel data augmentation method with multi-singer pre-training for stronger robustness and naturalness. Both quantitative and qualitative evaluation results demonstrate the effectiveness of WeSinger in terms of accuracy and naturalness, and WeSinger achieves state-of-the-art performance on the public corpus Opencpop. Some synthesized singing samples are available online (https://zzw922cn.github.io/WeSinger/).  ( 2 min )
    Multilevel Bayesin Deep Neural Networks. (arXiv:2203.12961v1 [stat.CO])
    In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which were proposed by Sell et al. [39]. Such priors were developed as more robust alternatives to classical architectures in the context of inference problems. For this work we develop multilevel Monte Carlo (MLMC) methods for such models. MLMC is a popular variance reduction technique, with particular applications in Bayesian statistics and uncertainty quantification. We show how a particular advanced MLMC method that was introduced in [4] can be applied to Bayesian inference from DNNs and establish mathematically, that the computational cost to achieve a particular mean square error, associated to posterior expectation computation, can be reduced by several orders, versus more conventional techniques. To verify such results we provide numerous numerical experiments on model problems arising in machine learning. These include Bayesian regression, as well as Bayesian classification and reinforcement learning.  ( 2 min )
    A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces. (arXiv:2007.05078v2 [cs.LG] UPDATED)
    In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.  ( 2 min )
    Addressing Missing Sources with Adversarial Support-Matching. (arXiv:2203.13154v1 [stat.ML])
    When trained on diverse labeled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection and curation, the labeled training set might exhibit a systematic shortage of data for certain groups. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"; we refer to combinations of subgroups and classes, or leaves of the hierarchy, as "sources". To characterize the problem, we introduce the concept of classes with incomplete subgroup support. The representational bias in the training set can give rise to spurious correlations between the classes and the subgroups which render standard classification models ungeneralizable to unseen sources. To overcome this bias, we make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup. This is done by adversarially matching the support of the training and deployment sets in representation space. In order to learn the desired invariance, it is paramount that the sets of samples observed by the discriminator are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method with experiments on several datasets and variants of the problem.  ( 2 min )
    Shared Data and Algorithms for Deep Learning in Fundamental Physics. (arXiv:2107.00656v2 [cs.LG] UPDATED)
    We introduce a Python package that provides simply and unified access to a collection of datasets from fundamental physics research - including particle physics, astroparticle physics, and hadron- and nuclear physics - for supervised machine learning studies. The datasets contain hadronic top quarks, cosmic-ray induced air showers, phase transitions in hadronic matter, and generator-level histories. While public datasets from multiple fundamental physics disciplines already exist, the common interface and provided reference models simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics. We discuss the design and structure and line out how additional datasets can be submitted for inclusion. As showcase application, we present a simple yet flexible graph-based neural network architecture that can easily be applied to a wide range of supervised learning tasks. We show that our approach reaches performance close to dedicated methods on all datasets. To simplify adaptation for various problems, we provide easy-to-follow instructions on how graph-based representations of data structures, relevant for fundamental physics, can be constructed and provide code implementations for several of them. Implementations are also provided for our proposed method and all reference algorithms.  ( 2 min )
    Out-of-distribution Generalization with Causal Invariant Transformations. (arXiv:2203.11528v3 [stat.ML] UPDATED)
    In real-world applications, it is important and desirable to learn a model that performs well on out-of-distribution (OOD) data. Recently, causality has become a powerful tool to tackle the OOD generalization problem, with the idea resting on the causal mechanism that is invariant across domains of interest. To leverage the generally unknown causal mechanism, existing works assume a linear form of causal feature or require sufficiently many and diverse training domains, which are usually restrictive in practice. In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature. Our approach is based on transformations that modify the non-causal feature but leave the causal part unchanged, which can be either obtained from prior knowledge or learned from the training data in the multi-domain scenario. Under the setting of invariant causal mechanism, we theoretically show that if all such transformations are available, then we can learn a minimax optimal model across the domains using only single domain data. Noticing that knowing a complete set of these causal invariant transformations may be impractical, we further show that it suffices to know only a subset of these transformations. Based on the theoretical findings, a regularized training procedure is proposed to improve the OOD generalization capability. Extensive experimental results on both synthetic and real datasets verify the effectiveness of the proposed algorithm, even with only a few causal invariant transformations.  ( 2 min )
    Knowledge Removal in Sampling-based Bayesian Inference. (arXiv:2203.12964v1 [cs.LG])
    The right to be forgotten has been legislated in many countries, but its enforcement in the AI industry would cause unbearable costs. When single data deletion requests come, companies may need to delete the whole models learned with massive resources. Existing works propose methods to remove knowledge learned from data for explicitly parameterized models, which however are not appliable to the sampling-based Bayesian inference, i.e., Markov chain Monte Carlo (MCMC), as MCMC can only infer implicit distributions. In this paper, we propose the first machine unlearning algorithm for MCMC. We first convert the MCMC unlearning problem into an explicit optimization problem. Based on this problem conversion, an {\it MCMC influence function} is designed to provably characterize the learned knowledge from data, which then delivers the MCMC unlearning algorithm. Theoretical analysis shows that MCMC unlearning would not compromise the generalizability of the MCMC models. Experiments on Gaussian mixture models and Bayesian neural networks confirm the effectiveness of the proposed algorithm. The code is available at \url{https://github.com/fshp971/mcmc-unlearning}.  ( 2 min )
    On the Search for Feedback in Reinforcement Learning. (arXiv:2002.09478v6 [cs.LG] UPDATED)
    The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical system is equivalent to the search for an optimal feedback law utilizing the simulations/ rollouts of the dynamical system. Most RL techniques search over a complex global nonlinear feedback parametrization making them suffer from high training times as well as variance. Instead, we advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop. We show that this alternate approach results in highly efficient training, the answers obtained are repeatable and hence reliable, and the resulting closed performance is superior to global state-of-the-art RL techniques. Finally, if we replan, whenever required, which is feasible due to the fast and reliable local solution, it allows us to recover global optimality of the resulting feedback law.  ( 2 min )
    Your Policy Regularizer is Secretly an Adversary. (arXiv:2203.12592v2 [cs.LG] UPDATED)
    Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbations of the reward function, which are chosen from a limited set by an imagined adversary. Using convex duality, we characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization, which includes Shannon and Tsallis entropy regularization as special cases. Importantly, generalization guarantees can be given within this robust set. We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness and its relationship with generalization. Finally, we discuss how our analysis complements and extends previous results on adversarial reward robustness and path consistency optimality conditions.  ( 2 min )
    A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima. (arXiv:2203.10973v2 [cs.LG] UPDATED)
    Non-convex loss functions arise frequently in modern machine learning, and for the theoretical analysis of stochastic optimization methods, the presence of non-isolated minima presents a unique challenge that has remained under-explored. In this paper, we study the local convergence of the stochastic gradient descent method to non-isolated global minima. Under mild assumptions, we estimate the probability for the iterations to stay near the minima by adopting the notion of stochastic stability. After establishing such stability, we present the lower bound complexity in terms of various error criteria for a given error tolerance $\epsilon$ and a failure probability $\gamma$.  ( 2 min )
    On the Applicability of ML Fairness Notions. (arXiv:2006.16745v3 [cs.LG] UPDATED)
    Fairness emerged as an important requirement to guarantee that Machine Learning (ML) predictive systems do not discriminate against specific individuals or entire sub-populations, in particular, minorities. Given the inherent subjectivity of viewing the concept of fairness, several notions of fairness have been introduced in the literature. This paper is a survey that illustrates the subtleties between fairness notions through a large number of examples and scenarios. In addition, unlike other surveys in the literature, it addresses the question of: which notion of fairness is most suited to a given real-world scenario and why? Our attempt to answer this question consists in (1) identifying the set of fairness-related characteristics of the real-world scenario at hand, (2) analyzing the behavior of each fairness notion, and then (3) fitting these two elements to recommend the most suitable fairness notion in every specific setup. The results are summarized in a decision diagram that can be used by practitioners and policymakers to navigate the relatively large catalog of ML.  ( 2 min )
    Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits. (arXiv:2109.09855v2 [cs.LG] UPDATED)
    We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. The goal is to sequentially choose actions for arms so as to maximize the expected value of the cumulative rewards collected. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy. Our policy is well-defined even if the underlying MDPs are not indexable. We prove that it is asymptotically optimal when the activation budget and number of arms are scaled up, while keeping their ratio as a constant. For the case when the system parameters are unknown, we develop a learning algorithm. Our learning algorithm uses the principle of optimism in the face of uncertainty and further uses a generative model in order to fully exploit the structure of Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algorithm. As compared with the existing algorithms, R(MA)^2B-UCB performs close to an offline optimum policy, and also achieves a sub-linear regret with a low computational complexity. Experimental results show that R(MA)^2B-UCB outperforms the existing algorithms in both regret and run time.  ( 2 min )
    k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations. (arXiv:2203.12913v1 [cs.AI])
    Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many natural language processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is under-reported, and a proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets. It is a multi-rater generalization of inter-rater reliability (IRR). We conducted two replications of the WordSim-353 benchmark, and present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353. These methods produce very similar results. We hope this discussion will nudge researchers to report kRR in addition to IRR.  ( 2 min )
    On the Kullback-Leibler divergence between pairwise isotropic Gaussian-Markov random fields. (arXiv:2203.13164v1 [cs.IT])
    The Kullback-Leibler divergence or relative entropy is an information-theoretic measure between statistical models that play an important role in measuring a distance between random variables. In the study of complex systems, random fields are mathematical structures that models the interaction between these variables by means of an inverse temperature parameter, responsible for controlling the spatial dependence structure along the field. In this paper, we derive closed-form expressions for the Kullback-Leibler divergence between two pairwise isotropic Gaussian-Markov random fields in both univariate and multivariate cases. The proposed equation allows the development of novel similarity measures in image processing and machine learning applications, such as image denoising and unsupervised metric learning.  ( 2 min )
    Learning the Dynamics of Autonomous Linear Systems From Multiple Trajectories. (arXiv:2203.12794v1 [eess.SY])
    We consider the problem of learning the dynamics of autonomous linear systems (i.e., systems that are not affected by external control inputs) from observations of multiple trajectories of those systems, with finite sample guarantees. Existing results on learning rate and consistency of autonomous linear system identification rely on observations of steady state behaviors from a single long trajectory, and are not applicable to unstable systems. In contrast, we consider the scenario of learning system dynamics based on multiple short trajectories, where there are no easily observed steady state behaviors. We provide a finite sample analysis, which shows that the dynamics can be learned at a rate $\mathcal{O}(\frac{1}{\sqrt{N}})$ for both stable and unstable systems, where $N$ is the number of trajectories, when the initial state of the system has zero mean (which is a common assumption in the existing literature). We further generalize our result to the case where the initial state has non-zero mean. We show that one can adjust the length of the trajectories to achieve a learning rate of $\mathcal{O}(\sqrt{\frac{\log{N}}{N})}$ for strictly stable systems and a learning rate of $\mathcal{O}(\frac{(\log{N})^d}{\sqrt{N}})$ for marginally stable systems, where $d$ is some constant.  ( 2 min )
    Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking. (arXiv:2203.13151v1 [cs.CL])
    Transformer-based language models (TLMs) provide state-of-the-art performance in many modern natural language processing applications. TLM training is conducted in two phases. First, the model is pre-trained over large volumes of text to minimize a generic objective function, such as the Masked Language Model (MLM). Second, the model is fine-tuned in specific downstream tasks. Pre-training requires large volumes of data and high computational resources, while introducing many still unresolved design choices. For instance, selecting hyperparameters for language model pre-training is often carried out based on heuristics or grid-based searches. In this work, we propose a multi-armed bandit-based online optimization framework for the sequential selection of pre-training hyperparameters to optimize language model performance. We pose the pre-training procedure as a sequential decision-making task, where at each pre-training step, an agent must determine what hyperparameters to use towards optimizing the pre-training objective. We propose a Thompson sampling bandit algorithm, based on a surrogate Gaussian process reward model of the MLM pre-training objective, for its sequential minimization. We empirically show how the proposed Gaussian process based Thompson sampling pre-trains robust and well-performing language models. Namely, by sequentially selecting masking hyperparameters of the TLM, we achieve satisfactory performance in less epochs, not only in terms of the pre-training MLM objective, but in diverse downstream fine-tuning tasks. The proposed bandit-based technique provides an automated hyperparameter selection method for pre-training TLMs of interest to practitioners. In addition, our results indicate that, instead of MLM pre-training with fixed masking probabilities, sequentially adapting the masking hyperparameters improves both pre-training loss and downstream task metrics.  ( 2 min )
    Extended critical regimes of deep neural networks. (arXiv:2203.12967v1 [cs.LG])
    Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing the contraction as well as expansion of internal neural representations and speeding up training processes, hence providing a theoretical guide for the design of efficient neural architectures.  ( 2 min )
    Dynamically-Scaled Deep Canonical Correlation Analysis. (arXiv:2203.12377v2 [cs.LG] UPDATED)
    Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. Several variants of CCA have been introduced in the literature, in particular, variants based on deep neural networks for learning highly correlated nonlinear transformations of two views. As these models are parameterized conventionally, their learnable parameters remain independent of the inputs after the training process, which may limit their capacity for learning highly correlated representations. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model. In our deep-CCA models, the parameters of the last layer are scaled by a second neural network that is conditioned on the model's input, resulting in a parameterization that is dependent on the input samples. We evaluate our model on multiple datasets and demonstrate that the learned representations are more correlated in comparison to the conventionally-parameterized CCA-based models and also obtain preferable retrieval results. Our code is available at https://github.com/tomerfr/DynamicallyScaledDeepCCA.  ( 2 min )
    Accurate Shapley Values for explaining tree-based models. (arXiv:2106.03820v2 [stat.ML] UPDATED)
    Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and estimated, implying that their analysis may lead to spurious inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the ability of SV to provide reliable local explanations. We also provide a Python package that computes our estimators at https://github.com/salimamoukou/acv00.  ( 2 min )
    Two Stage Curvature Identification with Machine Learning: Causal Inference with Possibly Invalid Instrumental Variables. (arXiv:2203.12808v1 [stat.ME])
    Instrumental variables regression is a popular causal inference method for endogenous treatment. A significant concern in practical applications is the validity and strength of instrumental variables. This paper aims to perform causal inference when all instruments are possibly invalid. To do this, we propose a novel methodology called two stage curvature identification (TSCI) together with a generalized concept to measure the strengths of possibly invalid instruments: such invalid instruments can still be used for inference in our framework. We fit the treatment model with a general machine learning method and propose a novel bias correction method to remove the overfitting bias from machine learning methods. Among a collection of spaces of violation functions, we choose the best one by evaluating invalid instrumental variables' strength. We demonstrate our proposed TSCI methodology in a large-scale simulation study and revisit the important economics question on the effect of education on earnings.  ( 2 min )
    Towards All-Purpose Domain Adaptation Under Confounding. (arXiv:2203.12720v1 [stat.ML])
    Current domain adaptation methods address the problems of covariate shift or label shift, but are not applicable to the setting where they occur simultaneously and interact with each other. In this paper, we propose an assumption, confounded shift, to begin to address this problem. We also propose a framework for this task, based on minimizing the expected divergence between the source and target conditional distributions. Within this framework, we propose using the reverse KL divergence, demonstrating the use of both parametric linear Gaussian and nonparametric nonlinear Gaussian Process estimators of the conditional distribution. We also propose using the Maximum Mean Discrepancy (MMD) within our framework. To make confounded domain adaptation with the MMD effective, we propose an intelligent dynamic strategy for choosing the kernel bandwidth, which may be of independent interest even outside of the confounded shift context. Finally, we show that our approach is advantageous on a variety of synthetic and real datasets.  ( 2 min )
    Is Fairness Only Metric Deep? Evaluating and Addressing Subgroup Gaps in Deep Metric Learning. (arXiv:2203.12748v1 [cs.LG])
    Deep metric learning (DML) enables learning with less supervision through its emphasis on the similarity structure of representations. There has been much work on improving generalization of DML in settings like zero-shot retrieval, but little is known about its implications for fairness. In this paper, we are the first to evaluate state-of-the-art DML methods trained on imbalanced data, and to show the negative impact these representations have on minority subgroup performance when used for downstream tasks. In this work, we first define fairness in DML through an analysis of three properties of the representation space -- inter-class alignment, intra-class alignment, and uniformity -- and propose finDML, the fairness in non-balanced DML benchmark to characterize representation fairness. Utilizing finDML, we find bias in DML representations to propagate to common downstream classification tasks. Surprisingly, this bias is propagated even when training data in the downstream task is re-balanced. To address this problem, we present Partial Attribute De-correlation (PARADE) to de-correlate feature representations from sensitive attributes and reduce performance gaps between subgroups in both embedding space and downstream metrics.  ( 2 min )
    Adaptive Regularization of B-Spline Models for Scientific Data. (arXiv:2203.12730v1 [stat.ML])
    B-spline models are a powerful way to represent scientific data sets with a functional approximation. However, these models can suffer from spurious oscillations when the data to be approximated are not uniformly distributed. Model regularization (i.e., smoothing) has traditionally been used to minimize these oscillations; unfortunately, it is sometimes impossible to sufficiently remove unwanted artifacts without smoothing away key features of the data set. In this article, we present a method of model regularization that preserves significant features of a data set while minimizing artificial oscillations. Our method varies the strength of a smoothing parameter throughout the domain automatically, removing artifacts in poorly-constrained regions while leaving other regions unchanged. The behavior of our method is validated on a collection of two- and three-dimensional data sets produced by scientific simulations.  ( 2 min )
    Possibility Before Utility: Learning And Using Hierarchical Affordances. (arXiv:2203.12686v1 [cs.LG])
    Reinforcement learning algorithms struggle on tasks with complex hierarchical dependency structures. Humans and other intelligent agents do not waste time assessing the utility of every high-level action in existence, but instead only consider ones they deem possible in the first place. By focusing only on what is feasible, or "afforded", at the present moment, an agent can spend more time both evaluating the utility of and acting on what matters. To this end, we present Hierarchical Affordance Learning (HAL), a method that learns a model of hierarchical affordances in order to prune impossible subtasks for more effective learning. Existing works in hierarchical reinforcement learning provide agents with structural representations of subtasks but are not affordance-aware, and by grounding our definition of hierarchical affordances in the present state, our approach is more flexible than the multitude of approaches that ground their subtask dependencies in a symbolic history. While these logic-based methods often require complete knowledge of the subtask hierarchy, our approach is able to utilize incomplete and varying symbolic specifications. Furthermore, we demonstrate that relative to non-affordance-aware methods, HAL agents are better able to efficiently learn complex tasks, navigate environment stochasticity, and acquire diverse skills in the absence of extrinsic supervision -- all of which are hallmarks of human learning.  ( 2 min )
    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders. (arXiv:2203.12742v1 [cs.LG])
    Bayesian optimization is a gold standard for query-efficient continuous optimization. However, its adoption for drug and antibody sequence design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, enabling gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit trade-off over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on a small-molecule task based on the ZINC dataset and introduce a new large-molecule task targeting fluorescent proteins. In our experiments, LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that Bayesian optimization is practical and effective for biological sequence design.  ( 2 min )

  • Open

    NERFs To Make 3D As Simple As Shooting a Video
    Gaming, creating CGI movies, building shared worlds, and creating digital twins are exciting in principle, but the complexity of building 3D models usually serves to limit the ambition of even the most dedicated auteur. However, recent innovations by nVidia, announced earlier this year for their RTX 3090 line of GPUs, are very likely to change… Read More »NERFs To Make 3D As Simple As Shooting a Video The post NERFs To Make 3D As Simple As Shooting a Video appeared first on Data Science Central.  ( 3 min )
  • Open

    [P] keras-genetic: Train Keras Models Using Genetic Algorithms
    Hey r/machinelearning! Recently when working on a WorldModels implementation for keras.io I realized that I needed a genetic algorithm implementation to train the "controller" module. Instead of writing a one off solution, I decided to write Keras Genetic, a full package to train keras models using genetic algorithms. Please note genetic algorithms are not good for training neural networks outside of some niche use cases; typically training a controller with <1k parameters. The ConvNet MNIST example scores *horribly* when compared to comparable backprop examples. Please give the package a try and let me know if you find this interesting or useful: https://github.com/lukewood/keras-genetic submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [D] Applying Keras ImageDataGenerator to the features AND labels of a dataset
    Hello, For a personal project, I intend to develop a little CNN - TransposeCNN model to colorize images of portraits. The idea is simply, from a grayscale image, to give the RGB version. To do so, I constructed a dataset of colored portraits and I would like to use the keras' ImageDataGenerator tool to specify that the feature to be the grayscale version and the label the original one. I could simply duplicate the current dataset and convert one of them into grayscales, and that may be easier to do, but then I would like to do something else : I would like to apply the same data augmentation functions from ImageDataGenerator (rotations, flipping... ) to the features and their corresponding labels. Do you know if it is possible or will I have to construct the augmented dataset explicitely ? Thank you for your advice submitted by /u/Arioxel_ [link] [comments]  ( 1 min )
    What is the motivation of checkpoint averaging for Transformers? [R]
    According to the original "Attention is all you need" paper (Section 6), for the base models, they used a single model obtained by averaging the last 5 checkpoints, which were written at 10-minute intervals. For the big models, we averaged the last 20 checkpoints. Is it about improving the performance? But if other works didn't do the checkpoint averaging, it wouldn't be a fair comparision. However, I seldom see the recently transformer works highlighting this technique. What's more, I have not heard any ViT (Vision Transformer) works utilizing such a checkpoint averaging trick. My background is computer vision so I was wondering if it makes sense to try this... Could someone provide some guidance on this? Thanks. submitted by /u/AaronSpalding [link] [comments]  ( 2 min )
    [D] Improving on time-averaging RNN or Transformer features for sequence classification? (few sample regime)
    Currently, for a sequence classification problem I'm using a pretrained network (Transformer or RNN) as a feature extractor. I average across time to obtain a d-dimensional vector per training example and train classifier on top of these feature vectors. I see two improvements to simple averaging of the features: Add an average pooling layer and train the network end-to-end. Add a weighted average pooling layer and train the network end-to-end Are there any ways that I can do better here? I am under sample size limitations, samples per class can be as low as 20, total data set size under 300, with 2-3 classes. I've tried method 1. and it barely changes the classification accuracy. submitted by /u/PK_thundr [link] [comments]  ( 1 min )
    [D] ML CPU Benchmarking - When to upgrade CPU?
    Are there any standard benchmarking tools or charts which compares CPU's to compare how much faster one performs than the other relative to ML training/predicting? For example, a "5600X" will train models 30% faster than a "2600X, etc. which may be seconds saved on a small model our hours on a larger model. In the example above, I've actually a 2600X for the last 2 years and am considering to buy a 5600X, wait until AM5 CPU later this year, or switch back to Intel 12600K although Intel is at least $200 more because their MB are overpriced. It would be nice to know whether it would save me incredible amount of time so I can justify getting more experience quicker or if it's marginal and to save the money on a nice 3080 GPU, etc. and get into Deep Learning sooner. submitted by /u/bugsysiegals [link] [comments]  ( 1 min )
    [D] Video Paper Review - Typical Decoding for Natural Language Generation (More human-like sampling from language models)
    https://youtu.be/_EDr3ryrT_Y Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods. ​ OUTLINE: 0:00 - Intro 1:50 - Sponsor: Fully Connected by Weights & Biases 4:10 - Paper Overview 7:40 - What's the problem with sampling? 11:45 - Beam Search: The good and the bad 14:10 - Top-k and Nucleus Sampling 16:20 - Why the most likely things might not be the best 21:30 - The expected information content of the next word 25:00 - How to trade off information and likelihood 31:25 - Connections to information theory and psycholinguistics 36:40 - Introducing Typical Sampling 43:00 - Experimental Evaluation 44:40 - My thoughts on this paper ​ Paper: https://arxiv.org/abs/2202.00666 Code: https://github.com/cimeister/typical-sampling submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] What's the MVM (minimum viable model) for node classification?
    One core idea in ML is to use/build a simple model at first to get some minimum threshold, and then find better models or refine the existing model, improve the data, etc. I'd say for computer vision, in most cases, a Resnet / EfficientNet would get a good result, given enough data. In NLP, if it's about something very simple, Naive Bayes methods can be decent. If the task is harder, BERT would do the trick up to a certain level for many tasks However, choosing the first model is not obvious for many graph-related tasks where a node has more than 1 feature. For example, in node classification problems, what's a model easy to implement that guarantees decent results? submitted by /u/adenml [link] [comments]  ( 1 min )
    [D] how do you defend the choice of ML algorithm depending the distribution of the data?
    For example, when is it better to use decision trees instead of SVM or KNN, based on underlying theory/distribution of the data? I would appreciate any empirical/theoritical advice or references. Thank you. submitted by /u/Redditagonist [link] [comments]  ( 3 min )
    [P] TorchMetrics -- How do we use it, and what's the difference between .update() and .forward()?
    TorchMetrics is a really nice and convenient library that lets us compute the performance of models in an iterative fashion. It's designed with PyTorch (and PyTorch Lightning) in mind, but it is a general-purpose library compatible with other libraries and workflows. This iterative computation is useful if we want to track a model during iterative training or evaluation on minibatches (and optionally across on multiple GPUs). In deep learning, that's essentially all the time. However, when using TorchMetrics, one common question is whether we should use .update() or .forward()? (And that's also a question I certainly had when I started using it.). Here's a hands-on example and explanation. https://sebastianraschka.com/blog/2022/torchmetrics.html submitted by /u/seraschka [link] [comments]  ( 1 min )
    [D] Should expert opinion be a bigger part of the Machine Learning world?
    I came across this Twitter thread which shows some interesting results when trying to recolor historical photos that have been decolored. The recolored photos are a lot drabber than the original, and the thread author says that this gives us a skewed view of the past, making us think the past was a lot more boring than it was, downplaying how vibrant and diverse certain societies were. https://preview.redd.it/fctveek4cjp81.png?width=3064&format=png&auto=webp&s=9aadecef7c12a4dd13f5197a69ffd155084be75e This made me think of some questions that I thought could lead to a good discussion here. I've put some below in no particular order! Do you think that expert opinion should be consulted in the Machine Learning process more? If so, where? (perhaps omitting Expert Systems) Is there too much faith that a result from an ML model is the "right" result? (a phenomenon that maybe isn't specific to ML but a result of human tendencies?) Do ML practitioners have a responsibility to clearly communicate to the general public the limitations and degree-of-confidence in these systems? Am I reading too much into this, and this colorization model is just a fun model to play with, and the conclusions of the Twitter thread are too speculative or conjectural? Is this colorization issue just another form of bias that needs to be ironed out? The thread concludes by saying that colorization should be left to experts who can use context to pick accurate colors. I think this is too extreme, and that ML systems can incorporate expertise when training, or after during evaluation. Do you think there are any jobs/problems that ML methods could be applied to but should be left to experts (some considerations might be safety, privacy, ethics, etc.) I know that ultimately a lot of these questions can simply boil down to statistics and their interpretation, so I'm not sure exactly where discussion could/should/will lead, but I'm looking forward to hearing your opinions :) submitted by /u/SleekEagle [link] [comments]  ( 7 min )
  • Open

    Build a mental health machine learning risk model using Amazon SageMaker Data Wrangler
    This post is co-written by Shibangi Saha, Data Scientist, and Graciela Kravtzov, Co-Founder and CTO, of Equilibrium Point. Many individuals are experiencing new symptoms of mental illness, such as stress, anxiety, depression, substance use, and post-traumatic stress disorder (PTSD). According to Kaiser Family Foundation, about half of adults (47%) nationwide have reported negative mental health […]  ( 8 min )
    Improve search accuracy with Spell Checker in Amazon Kendra
    Amazon Kendra is an intelligent search service powered by machine learning. You can receive spelling suggestions for misspelled terms in your queries by utilizing the Amazon Kendra Spell Checker. Spell Checker helps reduce the frequency of queries returning irrelevant results by providing spelling suggestions for unrecognized terms. In this post, we explore how to use […]  ( 4 min )
  • Open

    I wrote a GPT-3 based web application to help myself write more effectively - and it worked!
    submitted by /u/data-gig [link] [comments]
    my meme generating AI just came up with this (not technically AI)
    submitted by /u/snoggel [link] [comments]
    This Latest Paper From Twitter and Oxford Research Shows That Feature Propagation is an Efficient and Scalable Approach for Handling Missing Features in Graph Machine Learning Applications
    Graph Neural Networks (GNNs) have proved to be effective in a wide range of issues and fields. GNNs commonly use a message-passing mechanism, in which nodes communicate feature representations (“messages”) to their neighbors at each layer. Each node’s feature representation is initialized to its original features, and it is updated by aggregating incoming messages from neighbors on a regular basis. GNNs are distinguished from other purely topological learning systems such as random walks or label propagation by their ability to mix topological and feature information, which is arguably what contributes to their success. Typically, GNN models assume a fully observed feature matrix, with rows representing nodes and columns representing channels. In real-world circumstances, however, each trait is frequently only observable for a subset of nodes. Demographic information, for example, maybe exposed to only a small percentage of social network users, while content features are typically only available to the most active users. Continue Reading Paper: https://arxiv.org/pdf/2111.12128.pdf https://preview.redd.it/nylt2m19fkp81.png?width=1024&format=png&auto=webp&s=639c61207d4bffaa4c67f97263fcd5527a849f85 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there a AI which is able to gereate images which people would buy?
    submitted by /u/xXLisa28Xx [link] [comments]  ( 1 min )
    Artificial Intelligence Helps Cut Miss Rate of Colorectal Polyps
    submitted by /u/Beautiful-Credit-868 [link] [comments]
    I want to create an A.I. to tell me what to do next. based on my geo location and a wave of factors i will punch in(obvs i know nothing)all guided by a premise (ex..i want to help save our species from destroying itself) any thoughts?
    I dont really know how a i is done i imagine with lots of code i dont know and loads of statistics i really dont know. Can someone out in the land of 1's aand 0's point me in the right direction? submitted by /u/143openyourmind [link] [comments]  ( 1 min )
  • Open

    Question about calculating Entropy of a Decision Tree
    Hi! I hope this is the right subreddit... I am currently studying Electrical Engineering at University and studying for an exam that is (partly) about Neural NEtworks. I am struggling a bit with an example given about calculating the entropy of a decision tree and hope someone here can help me out. I have the following information: Given information The table is the given dataset and the tree is the resulting tree. I am trying to understand how the calculated Entropy values came to place, since I get different answers when I try it myself. This is my way (for example for "level"): p1 (Senior) = 5/14 p2 (Mid) = 4/14 p3 (Junior) = 5/14 Entropy = -(5/14*log_2(5/14)+4/14*log_2(4/14)+5/14*log_2(5/14)) = 1.577 (which makes no sense, since Entropy should be between 0 and 1??) Thanks a lot for any help! submitted by /u/inc0mingst0rm [link] [comments]  ( 1 min )
  • Open

    Can I use TensorFlow just for one function inside my PyTorch model?
    There is a Tensorflow function that I cannot convert into PyTorch. Can I use that function from tensorflow but still use my entire architecture in PyTorch? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    How do DQN and DDQN learn to not perform an action that gives a small reward for another action in the future that gives a bigger reward
    To clarify, deep q networks and double q networks would it ever discover such actions if its always favouring higher goals in the short term? what if another action has to be performed that may give a loss (potentially a significant loss) in the short term but in fact sets a path for a greater reward? are there any papers I could look at? submitted by /u/clockface99 [link] [comments]  ( 3 min )
    How to apply Deep RL in Arcade Learning Environment?
    I am new to RL but have experience on Deep Learning. Would you please guide me in the right direction as to how can I apply Deep Reinforcement Learning in Arcade Learning Environment? I also have basic knowledge of OpenAI gym environment. submitted by /u/AvailableBike9260 [link] [comments]  ( 1 min )
  • Open

    What Is a Transformer Model?
    If you want to ride the next big wave in AI, grab a transformer. They’re not the shape-shifting toy robots on TV or the trash-can-sized tubs on telephone poles. So, What’s a Transformer Model? A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the Read article > The post What Is a Transformer Model? appeared first on NVIDIA Blog.  ( 9 min )
    NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI
    When the first instant photo was taken 75 years ago with a Polaroid camera, it was groundbreaking to rapidly capture the 3D world in a realistic 2D image. Today, AI researchers are working on the opposite: turning a collection of still images into a digital 3D scene in a matter of seconds. Known as inverse Read article > The post NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI appeared first on NVIDIA Blog.  ( 4 min )
2022-04-24T00:52:26.190Z osmosfeed 1.14.4